4532-4552 Nucleic Acids Research, 2011, Vol. 39, No. 11 
doi:10.1093/nar/gkr036 



Published online 8 February 2011 



A novel immunity system for bacterial nucleic acid 
degrading toxins and its recruitment in various 
eukaryotic and DNA viral systems 

Dapeng Zhang, Lakshminarayan M. Iyer and L. Aravind* 

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 
Bethesda, MD 20894, USA 

Received December 23, 2010; Revised January 12, 2011; Accepted January 14, 2011 



ABSTRACT 

The use of nucleases as toxins for defense, offense 
or addiction of selfish elements is widely encoun- 
tered across all life forms. Using sensitive 
sequence profile analysis methods, we characterize 
a novel superfamily (the SUKH superfamily) that 
unites a diverse group of proteins including 
Smi1/Knr4, PGs2, FBX03, SKIP16, Syd, herpesviral 
US22, IRS1 and TRS1, and their bacterial homologs. 
Using contextual analysis we present evidence that 
the bacterial members of this superfamily are poten- 
tial immunity proteins for a variety of toxin systems 
that also include the recently characterized 
contact-dependent inhibition (CDI) systems of 
proteobacteria. By analyzing the toxin proteins 
encoded in the neighborhood of the SUKH super- 
family we predict that they possess domains be- 
longing to diverse nuclease and nucleic acid 
deaminase families. These include at least eight 
distinct types of DNases belonging to HNH/ 
EndoVII- and restriction endonuclease-fold, and 
RNases of the EndoU-like and colicin E3-like cyto- 
toxic RNases-folds. The N-terminal domains of 
these toxins indicate that they are extruded by 
several distinct secretory mechanisms such as the 
two-partner system (shared with the CDI systems) in 
proteobacteria, ESAT-6/WXG-like ATP-dependent 
secretory systems in Gram-positive bacteria and 
the conventional Sec-dependent system in several 
bacterial lineages. The hedgehog-intein domain 
might also release a subset of toxic nuclease 
domains through auto-proteolytic action. Unlike 
classical colicin-like nuclease toxins, the 



overwhelming majority of toxin systems with the 
SUKH superfamily is chromosomally encoded and 
appears to have diversified through a recombination 
process combining different C-terminal nuclease 
domains to N-terminal secretion-related domains. 
Across the bacterial superkingdom these systems 
might participate in discriminating 'self or kin from 
'non-self or non-kin strains. Using structural 
analysis we demonstrate that the SUKH domain 
possesses a versatile scaffold that can be used to 
bind a wide range of protein partners. In eukaryotes 
it appears to have been recruited as an adaptor to 
regulate modification of proteins by ubiquitination 
or polyglutamylation. Similarly, another widespread 
immunity protein from these toxin systems, namely 
the suppressor of fused (SuFu) superfamily has been 
recruited for comparable roles in eukaryotes. In 
animal DNA viruses, such as herpesviruses, 
poxviruses, iridoviruses and adenoviruses, the 
ability of the SUKH domain to bind diverse targets 
has been deployed to counter diverse anti-viral 
responses by interacting with specific host proteins. 

INTRODUCTION 

The use of toxins as a defensive, offensive or selfish 
addictive strategy is observed across the tree of life. 
Interestingly, a diverse set of protein toxins from distantly 
related organisms have a propensity to catalyze nucleic 
acid modifying or cleaving reactions in their target cells. 
Well-known examples are currently known from across 
the phylogenetic spectrum: plants deploy toxins such as 
ricin, abrin and modeccin to protect their seeds, which 
are RNA N-glycosidases that remove a specific purine 



*To whom correspondence should be addressed. Tel: +1 301 594 2445; Fax: +1 301 480 9241; Email: aravind(«;ncbi. nlm.nih.gov 
Published by Oxford University Press 2011. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ 
by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 



Nucleic Acids Research, 2011, Vol. 39, No. 11 4533 



base from eukaryotic 28S rRNA to render it 
non-functional (1,2). In a similar vein, the fungal toxin 
a-sarcin, produced by fungi such as Aspergillus giganteus, 
acts as a specific endonuclease that cleaves the 28S rRNA 
at a position close to the site of action of the above plant 
toxins (3). Among animals the use of nucleic 
acid-targeting enzymes is observed in the venoms of 
snakes (4). Several animals, including vertebrates, 
are known to deploy cytotoxic RNases, such as RNase 
A, which potentially target RNA from bacteria and 
viruses (5). Bacteria are a particularly rich source of 
nucleic acid-targeting toxins, which are deployed in 
various contexts. Pathogenic bacteria secrete RNA 
N-glycosidases that target the 28S rRNA of eukaryotic 
hosts similar to the ricin-like plant toxins (6). Bacteria 
are also known to deploy RNase and DNase bacteriocins 
in intra- and possibly inter-specific competition that target 
molecules such as tRNA and genomic DNA (7). The best 
known are the plasmid-borne toxins of the model bacter- 
ium Escherichia coli, which kill closely related competing 
strains. Of these colicins E3, E4 and E6 cleave rRNA, 
colicins E5 and D cleave tRNA and colicins E2, El, E8 
and E9 cleave DNA (8). Additionally, bacterial genomes 
are also colonized by systems such as the toxin-antitoxin 
systems and restriction-modification systems which 
produce enzymes that function as nucleic acid-targeting 
toxins (9-12). In these systems the primary function of 
the toxin is to kill the host bacterial cell if the toxin 
encoding system is genetically disrupted in some way 
(10,11). Thus, they act as selfish elements that forcibly 
'addict' the host to maintain them in genomes or 
plasmids. In many of these cases, organisms or genetic 
elements that produce the toxin also produce an antitoxin 
or immunity protein that renders the 'self resistant to the 
action of the toxin. The study of these toxins and anti- 
toxins or immunity proteins has not only expanded our 
understanding of the evolution of inter-species competi- 
tion but also thrown considerable light on the biochemis- 
try of nucleic acids and other molecules that interact with 
them (9-12). In practical terms these nucleic acid- target- 
ing toxins and antitoxins/immunity proteins are potential 
reagents that could be utilized in numerous biotechno- 
logical contexts ranging from chemical analysis of 
nucleic acids to bio-defense. 

Availability of an enormous wealth of genomic 
sequence data provides opportunities to identify novel 
versions of such toxins and associated immunity and 
delivery systems through computational analysis, thereby 
opening the door for new investigations on nucleic 
acid-modifying enzymes. The first step in this process 
requires detailed case-by-case analysis of protein se- 
quences and structures using the best available methods 
for detecting sequence and structure similarity. Results 
from such an analysis of protein structures needs to be 
further combined with in-depth analysis of genomic 
contexts and domain architectures to glean novel func- 
tional associations. Finally, these results need to be 
placed in the context of phyletic patterns of the occurrence 
of various components of the system in order to recon- 
struct a total picture of their natural history and predict 
aspects of their biochemistry and biological functions. 



Indeed, such a strategy has allowed the prediction of 
novel biochemical activities and has laid the foundations 
for further systematic investigations of the toxin-antitoxin 
and peptide-modification systems of prokaryotes 
(9,13-15). 

In this article we present the results of such a strategy 
that helped us uncover and characterize a remarkable, 
diverse class of nuclease toxins, whose immunity appears 
to depend primarily on a protein superfamily prototyped 
by the Saccharomyces cerevisiae protein Smil/Knr4. The 
Smil/Knr4 protein was first recovered in a screen for 
S. cerevisiae mutants that confer resistance to the killer 
toxin produced by the competing yeast species Hansenula 
mrakii (16,17). Smi 1 /Knr4 was shown to physically interact 
with the tyrosyl tRNA synthetase and it appears to func- 
tionally interact with the non-ribosomal peptide ligase 
Ditl, with a tRNA-synthetase-like catalytic domain, in 
the efficient synthesis of dityrosine a peptide metabolite 
that is typical of fungal spore-walls (18). Interestingly, it 
also shows synthetic lethal and physical interactions with 
a great number of proteins (19). Nevertheless, its exact sig- 
nificance and biochemical action has remained poorly 
understood (20). Parallel studies recovered other 
Smil/Knr4 eukaryotic homologs namely FBX03, a 
subunit of a SCF-type E3 ubiquitin ligase in vertebrates 
(21), and PGs2, a subunit of the tubulin polyglutamylase, 
which is a non-ribosomal peptide-ligase that links multiple 
glutamates to the y-carboxyl group of target proteins 
(22,23). Exploratory sequence surveys suggested that 
Smil/Knr4 homologs are also abundantly represented in 
bacteria (Smil/Knr4 domain, Pfam: PF09346). 
Furthermore, our preliminary contextual analysis of 
conserved gene neighborhoods of these representatives sug- 
gested that they might be functionally linked to potential 
nucleases. Very recently, a novel contact-dependent inhibi- 
tory (CDI) toxin system has been reported in 
proteobacteria that delivers multiple nuclease toxins into 
target cells (24,25). Our observations indicated that 
Smil/Knr4 homologs are potential immunity proteins in 
a subset of these CDI systems. Together, these observations 
prompted us to systematically investigate both the bacterial 
and eukaryotic Smil/Knr4 homologs and explore their po- 
tential connection to nuclease toxins, their delivery and 
immunity against them. As a result we were able to 
identify a diverse group of previously unknown nuclease 
toxins and immunity proteins that are present across all the 
major bacterial lineages with considerable significance for 
intra-specific and host interactions. This investigation also 
allowed us to uncover diverse, previously unknown 
nuclease and deaminase domains in bacterial toxins and 
predict their folds and biochemical mechanisms. We also 
show that the Smil/Knr4 homologs, which were ultimately 
derived from bacterial toxin-immunity systems, have been 
recruited by eukaryotic double-stranded DNA viruses to 
perform multiple roles in intracellular survival and mor- 
phogenesis of these viruses. Finally, we present evidence 
that the ability of the conserved domain in the 
Smil/Knr4 superfamily of proteins to bind structurally 
diverse protein partners has been re-used in eukaryotes as 
a means to recruit targets to peptide-modifying systems 
such as the ubiquitin and the polyglutamylase systems. 



4534 Nucleic Acids Research, 2011, Vol. 39, No. 11 



METHODS 

Iterative sequence profile searches were run using the 
PSI-BLAST program (26) against the non-redundant 
(nr) protein database of National Center for 
Biotechnology Information (NCBI). Similarity based clus- 
tering for both classification and culling of nearly identical 
sequences was performed using the BLASTCLUST 
program (ftp: / /ftp. ncbi.nih.gov/blast/documents/blast 
clust.html). The HHpred program was used for profile- 
profile comparisons (27). Structure similarity searches 
were performed using the DaliLite program (28). 
Multiple sequence alignments were built by MUSCLE 
(29), PROMALS (30), KALIGN (31) and PCM A (32) 
programs, followed by manual adjustments on the basis 
of profile-profile and structural alignments. The consen- 
sus for alignments were calculated and colored by the 
Chroma program (33). Secondary structures were 
predicted using the JPred and PSIPred programs (34,35). 
For earlier known domains the PFAM database (36) was 
used as a guide, though the profiles were often augmented 
by addition of newly detected divergent members that 
were not detected by the original PFAM models. 
Clustering with BLASTCLUST followed by multiple 
sequence alignment and further sequence profile searches 
were used to identify other domains that were not present 
in the PFAM database. Signal peptides and transmem- 
brane segments were detected using the TMHMM and 
Phobius programs (37,38). Contextual information from 
prokaryotic gene neighborhoods was retrieved by a PERL 
custom script that extracts the upstream and downstream 
genes of the query gene and uses BLASTCLUST to cluster 
the proteins to identify conserved gene-neighborhoods. 
Phylogenetic analysis was conducted using an 
approximately-maximum-likelihood method implemented 
in the FastTree 2.1 program under default parameters 
(39). The Modeller9vl program (40) was utilized for 
homology modeling of the structure of the IRS1 
N-terminal domain. Structural visualization and manipu- 
lations were performed using VMD (41) and PyMol 
(http://www.pymol.org) programs. The in-house TASS 
package, a collection of PERL scripts, was used to 
automate aspects of large-scale analysis of sequences, 
structures and genome context (Anantharaman, 
V., Balaji, S., and Aravind, L., unpublished data). 

Species abbreviations used in the figures are: AH VI, 
Anguillid herpesvirus 1; Aave, Acidovorax avenae; Abau, 
Acinetobacter baumannii; Adef, Abiotrophia defectiva; 
Ahyd, Aeromonas hydrophila; Ahyd, Anaerobaculum 
hydrogeniformans; Amar, Acaryochloris marina; Aory, 
Aspergillus oryzae; Apia, Arthrospira platensis; Asp., 
Anaeromyxobacter sp.; Atha, Arabidopsis thaliana; Bamb, 
Burkholderia ambifaria; Bamy, Bacillus amyloliquefaciens; 
Bant, Bacillus anthracis; Bbac, Bdellovibrio bacteriovorus; 
Been, Burkholderia cenocepacia; Beer, Bacillus cereus; Bcyt, 
Bacillus cytotoxicus; Bflo, Branchiostoma floridae; Bgra, 
Bartonella grahamii; Bmar, Blast opirellula marina; Bmcb, 
Brevibacterium mcbrellneri; Bmyc, Bacillus mycoides; Bpse, 
Bacillus pseudofirmus; Bpse, Burkholderia pseudomallei; 
Bpum, Bacillus pumilus; Bsel, Bacillus selenitireducens; 
Bsp., Bacillus sp.; Bsp., Bacteroides sp.; Bsp., Beggiatoa 



sp.; Bsub, Bacillus subtilis; Btha, Burkholderia 
thailandensis; Bthu, Bacillus thuringiensis; Btri, Bartonella 
tribocorum; Bvie, Burkholderia vietnamiensis; CHam, 
Candidatus Hamiltonella; CKor, Candidatus Koribacter; 
CV, Crocodilepox virus; Cace, Clostridium acetobutylicum; 
Caci, Catenulispora acidiphila; Cbac, Campylobacterales 
bacterium; Cbei, Clostridium beijerinckii; Cbol, 
Clostridium bolteae; Cbot, Clostridium botulinum; Ccar, 
Clostridium carboxidivorans; Ccel, Clostridium 
cellulolyticum; Ccel, Clostridium cellulovorans; Ccol, 
Campylobacter coli; Cdip, Corynebacterium diphtheriae; 
Cgle, Chryseobacterium gleum; Cgra, Campylobacter 
gracilis; Chom, Cardiobacterium hominis; Cjap, Cellvibrio 
japonicus; Clen, Clostridium lentocellum; Clep, Clostridium 
leptum; Cmic, Clavibacter michiganensis; Csak, 
Cronobacter sakazakii; Csho, Campylobacter showae; 
Csp., Chloroflexus sp.; Csp., Clostridium sp.; Csp., 
Cyanothece sp.; Cspu, Capnocytophaga sputigena; Ctro, 
Candida tropicalis; Ctur, Cronobacter turicensis; Cure, 
Corynebacterium urealyticum; Ddad, Dickeya dadantii; 
Ecol, Escherichia coli; Efer, Escherichia fergusonii; Erec, 
Eubacterium rectale; Even, Eubacterium ventriosum; Exsp, 
Exiguobacterium sp.; FAV1, Fowl adenovirus 10; Fain, 
Frankia alni; Fmor, Fusobacterium mortiferum; Fsp., 
Fusobacterium sp.; Fsym, Frankia symbiont; GHV2, 
Gallid herpesvirus 2; Gaur, Gemmatimonas aurantiaca; 
Ghae, Gemella haemolysans; Gint, Giardia intestinalis; 
Gsp., Geobacillus sp.; Gsp., Geobacter sp.; Gvio, 
Gloeobacter violaceus; HHV5, Human herpesvirus 5; 
HHV7sJ, Human herpesvirus 7 strain JI; Hasp, 
Halobacterium sp.; Haur, Herpetosiphon aurantiacus; 
Hbor, Halogeometricum borinquense; Hche, Hahella 
chejuensis; Hoch, Haliangium ochraceum; Hpyl, 
Helicobacter pylori; Hsap, Homo sapiens; Hsom, 
Haemophilus somnus; Hoi, Idiomarina loihiensis; Kalg, 
Kordia algicida; Kfla, Kribbella flavida; Krad, Kineococcus 
radiotolerans; Kset, Kitasatospora setae; Lara, 
Lentisphaera araneosa; Lgoo, Leptotrichia goodfellowii; 
Ljoh, Lactobacillus johnsonii; Lpla, Lactobacillus 
plantarum; Lsph, Lysinibacillus sphaericus; Mabs, 
Mycobacterium abscessus; Maur, Micromonospora 
aurantiaca; Meat, Moraxella catarrhalis; Mext, 
Methylobacterium extorquens; Mgil, Mycobacterium 
gilvum; Minf, Methylacidiphilum infernorum; Mlep, 
Mycobacterium leprae; Mmar, Microscilla marina; Msp., 
Micromonospora sp.; Msp., Mycobacterium sp.; Mxan, 
Myxococcus xanthus; Ndas, Nocardiopsis dassonvillei; 
Nmen, Neisseria meningitidis; Nmuc, Neisseria mucosa; 
Nmul, Nakamurella multipartita; Nsic, Neisseria sicca; 
Oana, Ornithorhynchus anatinus; Osin, Oribacterium 
sinus; Patl, Pseudoalteromonas atlantica; Patr, 
Pectobacterium atrosepticum; PbCVN, Paramecium 
bursaria Chlorella virus NY2A; Pear, Pectobacterium 
carotovorum; Pcry, Psychrobacter cryohalolentis; Pdag, 
Pasteurella dagmatis; Plum, Photorhabdus luminescens; 
Pmar, Planctomyces maris; Pmar, Prochlorococcus 
marinus; Pmel, Prevotella melaninogenica; Pmir, Proteus 
mirabilis; Pmul, Pasteurella multocida; Pput, 
Pseudomonas putida; Prum, Prevotella ruminicola; Psp., 
Paenibacillus sp.; Psp., Prevotella sp.; Pstu, Providencia 
stuartii; Psyr, Pseudomonas syringae; Ptim, Prevotella 



Nucleic Acids Research, 2011, Vol. 39, No. 11 4535 



timonensis; Ptor, Psychroflexus torquis; RHV1, Ranid her- 
pesvirus 1; RblV, Rock bream iridovirus; Rcat, Rana 
catesbeiana; Rery, Rhodococcus erythropolis; Rfla, 
Ruminococcus flavefaciens; Rinu, Roseburia inulinivorans; 
Rsol, Ralstonia solanacearum; Salb, Streptomyces albus; 
Save, Streptomyces avermitilis; Sbal, Shewanella baltica; 
Sbin, Streptomyces bingchenggensis; Scla, Streptomyces 
clavuligerus; Scoe, Streptomyces coelicolor; Sent, 
Salmonella enterica; Sgri, Streptomyces griseoflavus; Sgri, 
Streptomyces griseus; Shyg, Streptomyces hygroscopicus; 
Sisp, Silicibacter sp.; Slin, Spirosoma linguale; Smut, 
Streptococcus mutans; Snas, Stackebrandtia nassauensis; 
Sone, Shewanella oneidensis; Spie, Shewanella 
piezotolerans; Spom, Schizosaccharomyces pombe; Spri, 
Streptomyces pristinaespiralis; Srot, Segniliparus rotundas; 
Ssal, Salmo salar; Ssp., Streptomyces sp.; Sspi, 
Sphingobacterium spiritivorum; Sste, Sagittula stellata; 
Ssui, Streptococcus suis; Ssvi, Streptomyces sviceus; StIV, 
Soft-shelled turtle iridovirus; Ster, Sebaldella termitidis; 
Stro, Salinispora tropica; Svir, Streptomyces 
viridochromogenes; Swol, Syntrophomonas wolfei; Taue, 
Tolumonas auensis; Tcur, Thermomonospora curvata; 
Tfus, Thermobifida fusca; Tsp., Thauera sp.; Tthe, 
Tetrahymena thermophila; Ttur, Teredinibacter turnerae; 
Valg, Vibrio alginolyticus; Vcho, Vibrio cholerae; Veis, 
Verminephrobacter eiseniae; Vmet, Vibrio metschnikovii; 
Vmim, Vibrio mimicus; Vpar, Vibrio parahaemolyticus; 
Vsp., Vibrio sp.; Vspl, Vibrio splendidus; Vvul, Vibrio 
vulnificus; Wend, Wolbachia endosymbiont. 

RESULTS AND DISCUSSION 

Sequence profile searches and structural comparisons 
reveal a vast superfamily of Smil-related proteins 

As a first step to computationally characterize the 
Smil/Knr4 protein, we analyzed it using the SEG 
program to identify potential globular regions in it (42). 
This indicated the presence of a single globular domain 
that was then used as a seed in iterative sequence profile 
searches of the nr database with PSI-BLAST and 
JACKHMMER from the HMMER3 package. In 
addition to recovering other eukaryotic proteins with a 
homologous region, such as FBX03 from animals, 
SKIP 16 from plants and PGs2, a subunit of tubulin 
polyglutamylase complex, the search also recovered a 
large number of bacterial proteins such as the Bacillus 
subtilis YobK. Given the great diversity of sequences 
recovered prior to convergence from bacteria, we initiated 
transitive sequence profile searches with several distinct 
bacterial starting points to achieve maximal coverage in 
terms of detection. We also noted that a crystal structure 
for YobK has been solved by the joint structural genomics 
initiative (PDB: 2PRV). We used this structure as a query 
for structure similarity searches using the DALIlite 
program and recovered hits to four other homologous 
structures (3FFV, 2PAG, 2ICG, 3D5P; Z>7.5). Of 
these, 3FFV was the structure of the earlier characterized 
protein Syd from E. coli which interacts with SecY, a key 
component of the Sec-dependent protein secretion system 
that traffics proteins across the bacterial inner membrane 



(43^15). Consistent with this, we also found that Syd 
homologs were recovered with borderline e-values 
(e~ 0.03-0.05) in the above JACKHMMER and 
PSI-BLAST searches. Hence we included the Syd 
homologs in the profiles to further expand the relation- 
ships of the group of proteins homologous to Smil/Knr4. 
At convergence, some of these searches also recovered 
with borderline e-values proteins (e ~ 0.05) from certain 
DNA viruses such as FPV250 (gi: 9634920) from the fowl 
poxvirus, and the US22 family of proteins (e.g. US22, 
UL26, IRS1 and TRS1) from herpesviruses. To confirm 
the relationship of these proteins to Smil we used them in 
a profile-profile comparison search with the HHpred 
program against a library of HMMs created using the 
sequence of polypeptides in the PDB database as a 
query. These searches recovered the structures 2PRV, 
3FFV and 2ICG as the best hits with significant 
P-values (P = 10~ 4 to ~ 8 ). Furthermore, examination of 
the hits produced by the viral proteins in profile-profile 
comparisons showed that most of the versions from 
herpesviruses possessed two tandem repeats of the 
domain homologous to Smil. Additional transitive 
searches with these viral proteins revealed that homolo- 
gous proteins are present in a number of distantly related 
or unrelated DNA viruses. Finally, the above searches 
also recovered hits to two distinct groups of proteins 
each with over 100 representatives in the nr database, pre- 
dominantly from bacteria, typified respectively by 
CA_C3700 (gi: 15896931) from C. acetobutylicum and 
SGR_4389 (gi: 182438182) from S. griseus. Profile- 
profile comparisons with the HHpred program using 
alignments of each of these groups of proteins also con- 
firmed their relationship to the Smil -like proteins via 
recovery of significant hits (e = 10" 4 to ~ 6 ) to HMMs 
generated using the sequences of 2PRV and 3FFV as 
best hits. Thus, it became clear that Smil/Knr4 defines a 
large superfamily of conserved domains that is widespread 
in bacteria, eukaryotes and various DNA viruses but prac- 
tically absent in currently sequenced archaeal genomes. 
We accordingly named it the SUKH (for Syd, US22, 
Knr4 homology) domain superfamily. 

Structural features and internal diversity of the SUKH 
domain superfamily 

Despite the low average pairwise sequence similarity 
across this superfamily, all representatives are known or 
predicted to possess a similar core fold comprising of four 
conserved helices and six strands (Figure 1, 
Supplementary Data). Strands 1 and 2 form a P-hairpin 
and the strands 3-6 form a 4-stranded P-meander; 
however, the P-hairpin and the P-meander show only 
limited or no hydrogen-bonding along their length, 
despite being spatially beside each other. Thus, the struc- 
tural core of the SUKH domain can be described as a split 
p-sheet with only weak interaction between its two parts. 
This structural peculiarity could potentially be critical for 
the functional interactions of the domain (see below). 
Based on sequence-similarity-based clustering and phylo- 
genetic analysis five major groups can be recognized 
within the SUKH domain superfamily (Figure 1, 



4536 Nucleic Acids Research, 2011, Vol. 39, No. 11 



SMI1/KNR4 group (SUKH-1) 

ss_3d5p 
ss_2prv 
3D5P 

2PRV 

Btr_0398_Btri_163867651 

SC03116_Scoe_21221555 

SC0461LScoe^Z1222994 

Minf_1180_Minf_189219192 

PGs2_Hsap_68534957 

GL5O581_1605_Gi nt_253745024 

FBX03_Hsap_15812186 

5Mil_spon_19113060 

TTHERM_Q0835350_Tthe_118397935 

SKIP16_Atha_18390613 

FSAG_0O165_Fsp._23774OO91 

SYD group (SUKH-2) 

SS_3FFV 
3FFV 

BT9727_3056_Bthu_49478128 
patl _3 19 2_P at 1 .109899497 
SYD_Patr_50119961 
SYD_Sone_24373178 
VC0903_Vcho_15640919 
Tol a_0494_Taue_237807269 
5YD_l1oi .56459962 
sukh-3 group 
SSJpred 

CA_c3700_Cace_15896931 

ECUMN_0650_Ecol_218703894 

CLOBOL_04379_Cbol_160939486 

Haur_1547_Haur_159898071 

HMPREF0765_3732_Sspi_227539488 

HMPRE F02 04_494d_cg 1 e_2 2 7 3 706 5 5 

55HG_02122_salb_291451829 

FMAG_01630_Fmor_237737159 

CLO_2150_Cbot_2 51777787 

DSM3645_25222_8mar_87311406 

PcarcW_01O2OO006204_Pcar_227326924 

Sv-r.->4 Oin;(¥ir)2.4Srvi Svi r ?',f>:-;^cKjD 

SSTG_0195Lssp. .294629950 

Strop_0476_5tro_145593042 

RflaF_010100018406_Rfla_2 68611460 

Sukh-4 group 

SS_1 p red 

SBI_08996_Sbi n_297162402 

BB14905_10100_Bsp._12 6653202 

EFER_1468_Efer_218 548826 

SGR_698S_Sgri_182440778 

CcarbDRAFT_1426_Ccar_255524471 

swp_1184_Spie_2 126 34042 

DSM3645_1705O_Bmar_873O72 51 

GYMC10_4477_Psp._261408268 

ML5DRAFT_1420_Msp._288791791 

KSE_33900_Kset_31189679O 

5SDG_02083_SpH_297190718 

SSTG_02468_Ssp. .294630467 

SACTlDRAFT_6001_Ssp._282872997 

SSEG_10155_Ssvi_297202767 

Tcur_0784_Tcur_26912S043 

SSTG_05332_Ssp. .294633331 

AO090011000162_Aory_169782758 

US22 group (SUKH-5) 

SS_jpred 

UL26_HHV5_2 70355790 

UL38_HHV5_24234 5656 

UL23_HHV5_27035 5953_N 

UL24_HHV5_59629_N 

UL28_hhv5_2 9423 460 _n 

UL29_HHV5_136871_N 

UL36_hhv5_290564394_n 

UL43_HHV5_27035 5639_N 

IRS1_HHV5_222354568_N 

US22_HHV5_27035 5746_N 

US23_HHV5_44903342_N 

US24_HHV5_242345934_N 

US26_HHV5_15778OO02_N 

UL23_HHV5_27035 595 3_C 

UL24_HHV5_59629_C 

UL28_HHV5_29423460_C 

UL29_HHV5_13687LC 

UL36_HHV5_290564394_c 

UL43_HHV5_270355639_C 

IRS1_HHV5_222354568_C 

US22_HHV5_27035 5746_C 

US23_HHV5_44903342_C 

US24_HHV5_242345934_C 

US26_HHV5_15778OO02_C 

SORF2_GHV2_194750473 

ORF56_AHVL282174093 

ORFO07R_StIV_228861216 

ORF104R_StIV_228861312 

CRV021_CV_115531685 

ORF4_FAVl_149952398 

ORF077R_RbIV_50237559 

ORFl0!L_RHVl_109638570 

uS420_Ssal_2O9734932 

LOC10OO91O76_Oana_149585893 

BRAFLDRAFT_68179_Bflo_260B24181 

BRAFLDRAFT_76406_Bflo_260804041 

v250_Rcat_226372072 

consensus/60% 



B 




Hi 

hhhhhhhhhhh 
hhhhhhhhhhh 

15 AS SASIDDVEKLL-- 
24 ASHENIGRIEENL-- 

29 VNDVLIEKAEKTL-- 
161 VTEQQVQGVEEDL-- 

30 VDGEAPARAEAAL-- 
35 AIEEADRKCQEKF-- 
45 AERHMISSWEQKN-- 
29 AN QAD F E VWTK RN - - 

122 AREEDLDAVEAQI-- 
154 ATVADVD5LEYEL-- 
425 ATDKQIDDLQDNV-- 
115 VTEDDLQEFETSL-- 
177 ATDEDFEKVEKEL-- 

hhhhhh 

57 TGEQNVNAVERAF-- 
57 VDLDDFTEIEKKL-- 

60 EQT5EFKDL5EAL-- 

57 APSVALDGVERAL- 

58 EQPGSFTNVEHAL 
73 EVWADFSNVERAI 
55 PEM5DLQN IA5AL 
52 PEALNFDD L ESA L 

hhhhhhhhhh 

20 IDITDLWYYEKR 

24 ENIDKYIKCLSIK — 

31 VDISIAEQYYANH 

21 IDPEPWLYELRQQ 

18 FDGRNILKNLEFP 

20 RNIETALNQKINN 

28 VDPGQWVLPLEGE 

61 VDVEKIIEYYQKN 

20 INIDLI EKTFKEK 

8 ADRNVLNQLDMPK 

17 GWPRAETQPIPPF 

174 PTETWETI LLQTG 

33 IEDARASFRNDGG 

119 VDLWLARFADELA 

173 IDIDPLLEESMED 



H2 

hhhhhhhhh 
hhhhhhhhhh 

-NTTLPKQYKSiLLWSN- 
-QCOLPNSYKWLEKYG- 
-GLQFTT5 YKSFLKS VC- 
-GYRL PGAYRSgL KAAG ■ 
-GFPLPALLTELYLRIGr 
-GIALPRDYKELLGYAN- 
-NCVMPEDVKNiYLMTN- 
-DVDLPQDL VSFY'RQRN - 
-GCKLPDDYRCSYRIHI>- 
-ECTLPRDVRESLYIHD- 
-KLTIPKDLIALLKVAFJ- 
-KVKLPL PTRLLYRFVD- 
-GYRLPDSYKALMRIQN- 



hhhhhh 




EMS FHQS I KTL FGRWY-| 



HG6GKL GDNYIYIWAI- 

EGGLFG VLVLGYNFDH- 

RAEICG EEIL SVYNTNF 

"CAPVG 7-GLLVDQPFFTV- 

GFGPE YGLLPLLDHDP- 

FSFGY — -7-DHCTSWWQNRS- 

FHMTW 7-IIPLGSMAINS- 

LDACW 5-DKVAGRFLLNQ 

jQKLVV PGLLGSMAL5N- 

QDRGG 3-GILFGVTLLDI 

QLHDSYKTLSV 

JQELSS 7-LGLIGGYSAYS- 

gGELRK- -15- FDVIGVYGVDS 



DIKLTLLQTW5E 
SRLVNLEPIEPC- 
5GKLQI LQAWN E ■ 
SLSCQLLQVWSE 
WGTGELLQVWNE 
GRELTLLQVW5E 
GLAVTLVQPWNE- 
GHTISLLQTQ5A- 



H3 

hhhhhhhhhhhhh 
hhhhhhhhhhh 

EDVIAYNHDYGIQ- 

ASWN RTNEYKEH- 

-5-DDIVYYHITDIEN- 
- 3 -AAVNDLVYVNKCL- 
-4-AAVAQYLANRDSG- 
-6-LRSDIYRDLLEEN- 
-4-TQLTQSSMYSLPN- 

-4-RVIRVRHKT 

-4-EfJLLDVDTAAGGF- 
— -EEIEEESELWRRV- 
— QQIIEQNEVNAVS- 
-4-VYLLPLKEVMRET- 
-8--FG5KFWIEEW 



-1-Y LQKEYWA FgMDG DIGYILHL- 

-1-GLTDGLWIEDVD YFAYCLDT- 

-1-LATPQQLVVERTD FGETFYFDY- 

12 - FVQGGL L AVKVKG 2-IGSVWFCAY- 

-6-PWPEGVLPI0HWG CAMYACVDC- 

13-WNNPNFLFL[ERTR 1-GSREYI YDV- 

24-DSRSVIFELDSCN 1-SGKVCLVYK- 

SDVPFGIL5KD 1-YGTVFISL5- 

-6-KYCLPLTFCIHTG L5QYIAVEA- 

32 - YAH PGWIPL0KDF VGNNIAIDL- 

-1-FWKKNYLTvBopI DSNYLIIEC- 

10-ssrldlivnEasv---2-slkiflldc- 
- - - k ypdigvvicgts - - - 2 -ghdmi fldy - 



---SONSHRVDL- 
-5-GECPWEWDR- 
-5-AECPL|LRFP- 
-4-DVDPSWPPAD- 
-2-PQ0TVLLFEP- 
10-EOQEYN FTPF- 
-7-EDTEIWFLDR- 
-3-ERgSVWFLPG- 
-4-NKN E VFYQCP- 
-6-Q^•^5QVILFGR- 
-4-TSS^JVVTWNN- 
---TTSQLFTGTS- 
-6-GEPCWNIDQ- 



-5-TSIKYIg 
-2 -GYQDTVE 
-1-SDPQYYg 
-3-RLLLPC£ 
-6-HAWYVD0 
-1-TKEERLY 
-1-LYWHFLT- 
10 -NDLHMLC 
-9-I[WFIlg 
-2-DTKYWS 
-2 -GVSEELN 



.LPCV- 



hhhhhhhh 

hhhhhhhhhhhhhhhh 

ps fddllgkaiylnfnkl 

- - -dsfi e|f ynkiqe akddw 
— gnfyeIlckriianle-- 
---adfdallsrlagsppele 

- — psladilrtwtegtgwye 
---edfysmlegelayffevl 

- --dtftalyrllithlglpq 
---esfssIyrlmslhfgiic 

- --atftd|ft5yvknvvsgg 
— rs w adIl ai VAYDMENGK 
---hsfglIlenirwgllkke 
17-damll|leehgrrlqtgt 
dn fkdbveglfpsfdded 



324 



hhhhhhhhhhhhhhhhhhh 

DDFRRVQENLIGH LVTQKR- 
KSILEFFEVMKQYEENN--- 
EDF ERLQQMIVGHILMKRR- 
EDFTRMQENLIGHLLTQKR- 
ADF DALQQN I IGH LMMKQK ■ 
DDFTRLQENILGH LVMQRR- 
EDFERLQQN LVAHVLMLRR- 
EDGERLLENITGHILMKRR- 



LKLPPTLF1 
RREFRYIQI 
LRQPETI F~ 
LKLTPTLF 
LKQPPTWF1 
LKQKPTVFI0" 
LKLPITLFL 
LKQPETVFlgLGS- 



-1-ELEVISVCN L! 

-- 1 -EDMAIIFDN K" 

- 1-DDFVL5VDN Q 

■-1-EMTMVSLCN V! 

■-1-GDKMLTVDN AI 

-1-EMAVVSVCN L 

-1-ESRVISLDN E 

-2-DGLLVTIDN Q: 



jEVCKETL RKRTHLg 

blLIENF EENEFLE 

■AVMLEQV QPKEQIg 

EIVLEAF KKRQHLE 

SVlIEIP LPSVQLg 

NVILEKL AQREVLT 

EWLEQL KKRWVLg 

^VGLEWV EQHDVLg 



hhhhhhhh 

2 VTHAELVSAFGSL 

3 YDIREIREYYDSE- 
1 MSLNPFDVKINEK- 
5 PLRA PHAYPTTER 

15 WRNKLKI RE5EE- 
1 -MDAAIFDNYSKY- 
57 LQVESWISFFGAD- 

1 -MEPQYHTLYDHG- 

3 VTREQLTALWGER- 
232 GLGKRLAALFGED- 

5 VDLKSLERTVGEE 
5 URRGRLVRVFGEG- 

24 VTPNELVQAYGLD- 
5 IDRETLESEFDPE- 
5 LTREQLAEAYGEE- 
213 LPDGLLDEVFGGG- 

70 PNQAWLEKQFGPG- 

hhhhhhhhhhhh 
I MTSRR- 

21 RQTQMDVGGLIQA- 
1 MSVIKDC FLN- 

56 AR HM SOLAS LALT- 
65 LRDAWTHKRPKPR- 
42 PEVFCTRDLADLC- 

4 LRDTLMAYGCIAI- 
113 AVE LVN ET FRCSV- 
103 YQAMGAVWRAAFL- 

41 FQKTLSMLQGLYL- 
29 FLRVMGDFQGIFE- 

33 LYRQVLRDFKELF- 
13 VRRTLKGL RKLIL- 

138 EELPHTASLRALA- 
198 SLVKRYFRPLLRA- 
211 FKHTFAGMYELSQ- 
202 RYLRSLLAFRELL- 
144 MVQYDDYWIAVHL- 
252 RFETGGDVGREFM- 
261 LLPQPALRQTLLR- 
189 TEPRYHVQKFLCT- 
167 TRVPHDWAMLLR- 

188 YPKGARTMHVL FL- 

189 TTQPDHIICDLRL- 
36 LHEGIMEPILIPS- 

1 MDAF- 

22 SKEANKFI RELDK- 
52 YNEVLLLGLSLEN- 

34 AEMRLLAKHLSER- 

2 GGEKSSVKLDRFP- 
4 PCLCGRLYNFIYI- 
1 MNTA- 

67 KKKSWMKISH YID- 

1 MGPDAAP- 

154 KKEAKLKKELACL- 
144 SKEAKLKKDLSCL- 
2 LINIFMYSF5SPA- 

hp. . .hhhhbhhb. 



hhhhhhhhhhh 

■GFEI FPEAKKFLEE FG 
■KYFVNEI ICDFLbE FG 

g vpmmkttqr|yrky F 
qypifpqlieilsryg 
■ i egypldakkflqlvc 
fgelpmhlke|lifyg 

■gyrshplaegilaamg 

■NYILNDF AKKFLQt I > 
SINMPEKIRV1LQKYG 
SYLMFPAITQI FAEFG 
VPENHPASA-ILRAFS 
■GFE IHDAARRFl. AE FG 
SLDVIQPAVEfLREHI 
GLECFPAVRAALIEFG 
DVFPTDIQLA|TREFG 

hhhhhhhhh 

11-RHFANPDDRRLLTEVG 

10- NFGVSVENADiMIDIG 

-8-sksnllkeirdkvyli 

11- spqfsmtqlh|resve 
10-nlklpqetxd|llktk 
-6-elglhlstir|i_sdig 

- 8 - DDQL RGDVQGi LQS FG 

10- gqgmhdktvetlvhrg 

11- avlgplgpev|ppdga 
11-aadlpeefsrplqrvg 
11-aygfngealr|lteig 
11-paaadtegarllrtvg 
-9-tagfdarsas|lsfvg 
ii-tirhe-psad|lrnvg 
ii-dllihqptrd|ltsvg 
11-palvhaptrr|ltevg 
11-vdltnkpare|linig 



■MIDVYC-13' 
-BLIVRH-4 
-BLCCEW--7 
-EFRLNS-18 
-LLKVQD--7 

□LVIE 

-PL5IEP--6 
-ElKENW-10 
"LEIKF--5 
LKLAT--1 
-gLTVGQ--4 
VGVPY--4 

LVAPL 

ILPQ--5 



ee hhhhhhhhhh 

YHFNRYDLYTTN--8- LSRDCISEYEE 1- YVEEKLVWSSL NGNQYLMIS- 

DPNEYDK5DFDP - - 7-DSKWITEVYE P LVNKRLCPVSVGF SEHMTYFIS- 

NWAADFQFALFP--1-LVNGIKNHLE EPI SGICPAVSWR RSNRLPVRS- 

-W7DSPLEIQLLP--8-ATFWRSHAWQ LRNLQIAPlERLS- — 3-RGVSSL FVL- 

QRYIASYFEINP--7-DSDGOFTYYSS 1 LGKQIFNLWYFK PDGYYICCD- 

DCKPYKSDWNT--6 - FLKNNVSAML PFPGIYYKVgYFY PDHYMIYTD- 

N FSND E PYN FD P - - 5 -GQRD L AAEVE 5 VLGGDYFPIgEW L5YSSVFVE- 

VRIDGNDFiFSI - - 9 - DKEYELKQI PE EVRDKVIPIVETG 1 -HIGGTLWID- 

AIAIDEVIEFNP--8-DSEY FMEIFDE 1-DTDDWYPIgIAN RGNLLMLMT- 

RETIWIGVEADP ECIEECSEFEA HVDTQLFPISHCS GCDFIYIVA- 

EACASGDI I FGG-- 4 -QEDETLL EWQG 1 LSSTL I LVWETH HSHGAL LMD- 

DSMPWMEFSLDP - - 4-WDAEIIDDLAE QAGVDLYPIgMRD RGNQHLALA- 

HOSPEDWMFFP- - 5 - GAAEDVQELAD GLGKKVFPIgHDT YDGGTFLMD- 

EPGAG FASGIQP-- 5 -VLADCSEIFAE EFNNPVFPLKNNA DGPSELVMD- 

TDLNNMGFLIGN- -9-IAKQALLTEEK 4-SYGADTLC\^YCN DGEDQIWLT- 



ESgKI*|TE HGFFg- 

DOEauIgGFD NYFCIIg- 

ASPSVI SATT TRRRSS- 

5NgQIMHNST--6-KGLSLlB- 

VOWRVfKIGE YCFYVg- 

LKaSViLVGD SYFKMN- 

SGgRWAAGM GWIWELg- 

DKgK FlHTL Y - 10 -S1FDYLE- 
ENETF|rYTD GFLCKDg- 

|llnd elrlye- 

Jgmsf--2-dafwfe»- 

-EDHSvBAGMD SVWLLE- 

- EAgRFBYLHH SGEYFLg- 

-AqHrVIQLHW ADDFFIg- 

- PYgQIIVR QKIVg- 



l-II 



hhhhhhhh 

SNLAEjLNQLKPLL 181 

NSLAELINWLKVELW- - 181 

PDLATlLDEIEPTTTD-- 186 

-— ptlaaIlsglnplav— - 182 

tslaebiaalspriappv 186 

---adletIleqlepkvn--- 198 

---dslpallqrlsplaqtav 183 

---dslaa|ldncqpqvetdk 181 

hhhhhhhhhh 

NNAEEFWDRILNYDIVTN 151 

GNINDAMLHLF FEHEKIK 149 

SPSTGNCMRNMSTRMRSS 151 

ESLPLAMQRLLQSFLLIQ 170 

VDLYEGIENILLMNTLK5 145 

--TI FIKGI ENLIEDNWDQC 137 
--STFEDSLELAVCANRPLN 152 
-YfJFRY|LGMEIYVTFENq 205 
--ENIEEMLDCWGECKEPI 148 
--LDIYGAI ECLLLDRWRCD 12 3 

- -ATFGAAV ERI L LGR KGQ P 140 
--PTPDEALRRLTGRMQPAA 295 

- -NDKY S ALMSYS RGY P LDD 152 

- - PDIDTAI IALIRGG RMPA 
"RNFIEALNCTIGY 



-MPQSL- 
VPKEF- 
QPYDN- 
DPRVP- 

LPEWC- 
LPHTV- 
-MPVFD- 
PVEV- 
-LPREL- 
-LPDDE- 
PDGT- 
-LPHSE- 
-IPARP- 
-LPNST- 
PGDG- 

p:wi<- 



hh hhhhhhhhhhh eee 
---apdgglnlddfmrrqr-grhldl- 
---salgkvalryavrklm-krgarl- 
---lldrwrppktsrpwkp-gqrval- 
- - - a efglgclealvri na-bqvl pv- 
---erasgvhlqryvrata-rrwlpl- 
---vrrdyeglrrylrrfe-sscvsl- 
— -ragdfmglndfleqec-gtrlhv- 
---tadarkdlqklvrrvs-Btvlrl- 
---anstgramrkwsqrda-gtllpl- 
---rqydppalrtyvqrhq-ettval- 
---cqysadvlrdfvrnhv-dqvlsl- 

fcle pmeitryvhrne -@rclsl- 

cqds rqdi rh l vrs ya - dmn i sl- 

- - -GCMHIH AFAQfRATYR -g RLMVM- 
- - -W SLGLDTMARjl I RH H -RQFMP L - 

---ilhdrtnllrvcqlha-eskrrl- 
— acedtaarca|veahr-eaqltl- 

yrgdveslsaevtk rg-yasyti- 

- - - lard l lalwrlcmkre- 

aeeatalgrelrrrwa 

dptdaaavaktarems- 

---hghdadalaacvgehh- 
---ndttpegsyataeril 

hdnsiselqrvtcryr 

tiadlegirelvrkfr- 

vqatfeelpqlvddrr 

K PGD LAVVSBjVKS N T _ 

_ iKLHP 

PLAGPDLI AtjADRHI 

YWGTLEEID1 _ 

QTIYRVCISNKMESTH-gTTIAL 
---SAKWRAEGDRiLEKFR-ECKLRI 
AVEFLGQIKEIVSKHR-NNKILL- 
PPGSLDALSGlVSRNK-gKTVGL 

ayksptklqa|fqeny-@isvpl- 
aykspkklkaIfqeyr-kttlav- 
adshdllvdn|venfs-ddlirp- 

shhh . .shppalpphs. thplsh, 



'RDVNF- 
'CDVKL- 
'ECVPL- 
RLLPF- 



-1--GHLYFGNVR 

-1--DFEFYEFT 

YALTFYHN 

KYFDVPDEA 

-5--IKFFNKEHG '. 

-1--PNMFFGEED 

-5- -QELKISFAP 

PQEPPLG 

- 1- - I LFTVDVSV 

-1- -GFFALHLPE : 

-1--FEITFRMPV 

VLLRQPGED 

VSTSREDIE 

-1--PWFDLVDGS 

AGGLVDVDE 

FMFYGPAPE 

-1--PRIGFNSTH 

eeeeeee 
4-GYTL FVCDVE ET- - 
4-GLYVCICOPSYE- ■ 
4-DRCLVIRRRWRL- 
4 -GWN LVLQEI ETD- 
4-LHGIMLGDTQYF- 
4-QCIYWGGEHSP- 
4-RCFIQLRSRSAL 
4-GWFFTYCDLLRV- 
4-GFYARVTPR5QM- 
4-NWFLVMREQAAI 
4-NWFLQVRPGSTH 
4-GWHVMLRTEDGI 

6- GWRLDFVEFEDI- 
2-YSVIRVSTIRLY- 

7- LRLCNLRCFENS 
4-ACTFTFGSWNVA- 
4-KHSLVLRTARDL 
9-THFAFWTHNTEV- 
4-EALTTWLNGSQ 

6- LQPMVLLGAWQE- 

8- EPLLMLGSIEGL 

7- KTLKLLTSFGCL 
7-VIMRLMKTV(JQL 

7-RPLLLCGQAENL 
4-DFCLRIGGLEAS 
2-GIVLRIAPLDGT 



hhhhh 

-QTISQAVDTG- 

-SFRKTVI 

-TLYYFTA 
--VYQALDFLEP- 

-TNLDEKFV5E- 
■-WLPELLTQ--- 

■ -VGYNSLIGWG- 
-LLFHEALPLQ- 
-ELFSTLSIEL- 
-PDLAEYLAGL- 
-TFDAAFIWDR- 

■ - PL VQGWS VG- 
-TDLDAISLGS- 
-RTLGDAYDDL- 
-PPMSEHEY FG- 
-PTLPQRWAES- 

--DS LCRYTG EE- 



--6-PEVRDDIVI0AGM GGFACMSR-- 

KEVQFIKVWNCS 2 - -EYGLYLKE-- 

NSKRYIVF PHDL EGHFAIEE-- 

--3-GAHGEFVKF STG 1--FGSILWD p— 

5-EKFR5YYVV JDNL 2 - -GNLIAIQE- - 

NEKLAVL ?SDR---1--DNHICTST-- 

8-ARLGEQLVI rQE"- - 
1 RE 

2-QGPRPYWL rASP---3--ALLFCLDA-- 

- -9-AALLGQRLI JTDG WALLTVDT-- 

-9-AGVESWKI AFP INAWIOP-- 

24-GGVGGWPV1 ULL HAHLVLDP-- 



■-VASLCVDR-- 
- DIHGAYIA1 REI---3~GLHVAK!K— 



EVY-- 




hhhhhhhhh eeeeeeeee 

- 2 - TPRDVE YWKL LV VTQGQL R^ 

-5-LSKISWLERHCP PLDQELII 

- 3 - EGRDAQRLASYL CCPEPLR 

- 3 -KPEDVKAWSHYL CCQTRLA 

-3-ROHKTYRRFSCL RQ AGRLYI 

-4- EIDLEHCQNDFF GEFRALHLI 

- 5 -KMGTVC5QGAYV CCQEYLHPF -, 

-4-HLNIKGLEKTFL CCDKFLLPV ?TVS- 

NGVGATOLRQLS PRDAWIVLV JTW 

-1-QIYARSLAADYL CCDDTLEAV eVI 

PELRDQLLDDVI CCPERLIVL V 

- -"k « 

■i h/L 

-1-WSEINDWRVMV GSNHVEPL SwL\, 

MRVLGLGTV SLKC 



LEQAHFVVI ?WME 

-2-SASMLRRFQRSL YTREPVMPL EIE 

16-IQTMELMIR7VP RITCYHQLL ALG- 

-5-AHGNWLKETCSL NVLQVFWF JVPV- 

13 -A5LLTAVRRH LN 1-RLCCGWLAL AVL 



IR-gRHL 1 
IR-RrVAl 
"H-gTTI/ 



ILSGT 
- 7 - DMYI RI CG LDGT 
-4-RCCLRVGDLSST 
-3-KHFLVIGDLEGT 
-4-GATLTFASAADT- 
- 5 - NAYME VCAL ADT 
-4-SYELRIGGLEDT 
-1-DARLRVCDLGGT 
-4-DIFTPLEEGSSD- 
-4-ERFTPWKGGSSD 
-7-NLHIRICGLDGT 

....hhbhbh.h.s.. 



ll-AOFRDLLNFIRQ RLCCEWYVY LVG- 

11-ARLAECEMYVTL QLRCRWYLL AVG 

11-VESRRWWWAVRA NLATPWYVL rVTG' 

11-EQFNDLLKFFVO RLCCETMIN I WC 

-5-EELLEYCEALYL PQHV KMEI\ IVC 

-5-DDIVSAWETLFL PTPEKMVVL AID 

- 5 -TFMLESWRVL YL PKPTQMEVL ?TLE 

- 5 -QYTLESWEELYL PE PTKMEVF£\ 

- 5 - EHQMEDWAYL Y L PRPQRLAVI 

-4-QHLKEYCEVLYL PSPKRMTV: 

- 3 -SNTIRGWSKFYL PD WQME W fei . , 

-5-PQWEDWNSCYL ET5VNFMVV RCA- 

-5-GEQLEVWKDCYL PERM EMVVI :Al 

- 5 -ERTLEFWENLYL PSPERMRVt ™* 1 

-2-RFRVWKCSFAYV NPPEKLTV: 

-2-RFRLLECSFGYV NPPEKLIV! 

- 5 -NYRLESWGQN FL PKFTKMKVLCiIE 

hbphbbhhb hsppbbslthhs 



.AVL : 
'TV I ■ 
fH ■ 



eeeeee 
--6-WDRSVAGVA-- 
--9-TRQLVLFMT-- 
-11-NVPSELYLG-- 
- 10 - KKTAVC L IS— 
--8-NTAPEIWVS-- 
- - 2 -TCRYQVFVD- - 
--5-RYQLIVLIG-- 

- - 6 - RPPLPVLIG-- 
-23-QDGLYLALG-- 
-16-ELPCVLML5-- 
--8-ETELVLCMG-- 
- 15 -ATPYWFMG-- 
-37-QARLVLLLG- 
--2-YDVINLFVD-- 
-11-VDLIPIWA-- 

PVN KAVFMD-- 

--7-FHRVRILCG-- 
- - 9 - LVRQYVLVO - - 
--4-RLDISI LVN-- 
112 -RAMLWVLD-- 
- - 8 -VPSGLVLLD- - 
--8-DTSFLIIFD-- 
--8-VAEVLVLLD-- 
--8-HADFVILVD-- 
--7-GMQLLILVA-- 

— 6-ICQVIVLVG- 
--7-FPEC1VLVG-- 
- - 7 - F PQWIVLVG - - 
-14-WQQLVI LVS-- 
--7-GLQWILVA-- 
- -4 - SDQMVL MTC - - 
--6-TRRPWLLA-- 

-7-GWQLVLLLC-- 
-7-AEQLVILVG-- 

DDAVVMVGT-- 

DDAWMVGT-- 

-7-FPQWIVLVA-- 

sphlllhs.. 



eeeeee 

-AOgSVLCYEI- 
PKWD\ AYDS 
-ASgA^ LWTD- 
-DE5YV CYVR- 
-GHgHA AYLP- 
-AYHA\ ayda- 
-QRSGVYCYDD- 

- LG>R\ VY5P- 
-AGFRV VYDL- 
-HYETV VYDW- 

- GGTRl IYEP- 
-RF5RV AYDT- 

- RYE TV CLDR- 
-DCMRV AVNfv 

YTgAV ACDV 
-AH£Gi VLLY- 

-dtStwaal v- 

-TF£VV GYDP- 
-ESWAV GVMP- 

- ELKAV GYCP- 
KrEvV LHKI 

-RFCRF WIV- 
-WFgAV AIQM- 
-RACFF YFDV- 
-EGgEV AYEE- 
-tCrCV FVDA- 
- EDQCV AYGD- 
-ECgHV gyed- 

-efEtv aadc- 
■ fdktv gyee - 

-GDgHWVYDG- 
-D5£RI LYYK- 

-ecPnvyayef- 
-egsav afee- 

-LSgKV mukd- 
-QSgKVYLDKD- 
-kt^fRV GY'K r - 

. pstplahbs. . 



-4-RKFALVN- 
3-PLIYTLN- 
7-YSLIFCN- 
-2-EHVVLVN- 
-5-WTYFVN- 

SNLDFVE" 

-3-PPQMFVN- 
-3-QKPTYMN- 
-2 -PNLETVN- 
-2 -CEARYCN- 
-2-KELIPVH- 
2 -ETVRGLH- 
•2-SGYTSLH- 
-2-SEIYPLN- 
-4-DEAALLN- 
- 2 -DHVVPVN- 
-7-RDRGLVE- 

eeeeee 
-2-ENFWRg- 

GILFFL0- 

-4-DSLTFVG- 
'LE- 




-2 -RQAHFLjj 



289 

hhhhhhhhhhhhhhh 
--tslerbletayqlrlkfk 136 
--qnlrtbflfhlikqeism 122 
--tninhIisfnnififmvl 119 
--tnlerIslcvrglieryp 13 3 

--5SF5kIhE5MNYFKKMKN 166 

--tnpallgkaleefqacin 113 
■-ssvlcagqslqaavawsq 198 
--asidsllafmdrvlrfkq 122 
--stfaa bveflyrlgqlia 133 
-adltaJtrclallaehrp 377 
-gdlsslartvssfleyig 138 
--tdvsslvevafrfqrlle 154 
--rdveslvyalvefrklev 15 7 
-kdlpslanflyllekerp 138 
--sditclayflyildrahq 147 
--advstlaftlwlhqreqe 354 
--qgldslavllgmwavtk 214 

hhhhhhhh 
-adslqqllergllhsyfed 12 3 
■-pswac*/hgaivleywna 154 
-esitellniglrrcnfit 135 
-rnlmemarvglravetlh 190 
-lsfgeIfenglfavysff 197 
--sdlagIfakgmircdpvh 169 
-ptmkd|lrkgfrhcdhfh 135 
-rsgfrglvqeglrnyaplr 245 
-rdadeIfrhgagevvrvy 247 
--sdika|s<ngllwceyvy 179 
- -rh lde l arygmmyt eav y 158 
-hnldelarygvsrseiay 169 
hslddlarhgllhceaiy 179 
■rgmtrcheng 256 
iclglnll fen r 336 
rqgsfwfrcpr 331 
'rvgl alliddf 332 
e dv vm ftcvmgk kgh rnh r 296 
-rgllgifrvgflrfcnny 382 
aelshIlragvlgalalg 510 
-dnfhi^lkcgllklrglc 333 
-dsleelfraglmkvyvrr 322 
-ntitelfrmgllkmvfrh 333 
-dsvdmlltvgllkiyqag 332 
-tsf5e|leigvkslgrev 167 
psiaeldtnvspttppias 121 
-ysvkqlveeg1qetgisy 153 
-ds lqk l vkdgi ktdvhyd 186 
-ssleqblrdgvlnlgref 172 
-stipelfrigmqnfgtev 131 
tggmeqlrrrgley pasks 128 
-yslqhlleegvcnlnyad 12 3 
-rsledlfetgaefpglek 198 
-ptvasllqvglrrfglka 12 3 
-dtlsdiaaygcpndptff 275 
-dtlsdIadqgcpqnftff 265 
-nsleylvkdgiknietyt 136 
.sslppabphhbbhhhhhh 











2PRVA 
(SMI1 domain) 



2ICG 
(SMI1 domain) 



3D5P:A 
(SMI1 domain) 



3FFVA 
{SYD domain) 



IRS1-N model 
(US22 domain) 



SMI1_Spom_1 911 3060 (smtT) 

FBX03_Hsap_1581218 (Fto 
Scoe_21221555 

aro_25572B254 Qmii^SMI? ) 

Cspj_2 13961789 ^SIVll1^[~Sm-u ] 
Chom_258544654 
C8p._160893417 
Bcen_170736423 

SYD_Sone_2437317S ^sylT) 

CF-44_Caee_15026805 I gUKH^ ) 

Svir_256803980 ^uk^3)eukh^ 

Sbln_297162ilu2 f^UKHj) 

Ssvi_297200035 ^ukh^[|ukhj) 

IRS1_HW5_2223S456fl (us2?yJjS22) 
UL3B_HHV5_242345S56 ( uszz) 

U5_HHV7_1 722827 fTuS2?XjJS^X 



Figure 1. (A) Multiple sequence alignment of the different groups of the SUKH superfamily. The consensus was derived using the following amino 
acid classes: a, aromatic (FHWY, black on orange); b, big {EFHIKLMQRWY, black on light blue); h, hydrophobic (ACFGHILMTVWY, black on 
yellow); 1, aliphatic (ILV, black on dark yellow); p, polar (CDEHKNQRST, blue on gray); s, small (ACDGNPSTV, black on blue); t, tiny (AGS, 
white on dark blue). Secondary structures derived from PDB structures or predicted using Jpred program are indicated above the alignment ('e' in 
blue, [3-sheet; 'h 1 in red, ct-helix). The numbers in bracket are indicative of the excluded residues from sequences. (B) Cartoons of known structures 
and a homology model of the US22 IRS-N domain made using Modeller are shown in approximately similar orientation. The oe-helices are shown in 
purple, p-sheets in yellow, and loops in gray. Surface diagrams are colored based on their positions relative to the center of the structure (outside to 
inside: blue to red) to illustrate the cleft. (C) Domain architectures of representatives of the SUKH superfamily. Other than the domain abbreviations 
already provided in the text and the rest of the domains are the Ig-fold domain overlaps with PFAM DUF525; MoeA is a domain found in the 
MoeA protein of the molybdopterin biosynthesis pathway; U5, herpesvirus U5-like family (PF05999). 



Nucleic Acids Research, 2011, Vol. 39, No. 11 4537 



Supplementary Data). The first of these, and the most 
widespread, is the one typified by Smil/Knr4, FBX03, 
SKIP 16, PGs2 and YobK (that entirely includes the 
PFAM model PF09346, 'SMI1/KNR4 family, and add- 
itional proteins not detected by that model within it) and 
is seen in both bacteria and eukaryotes. This ensemble, 
which we term Smil-like or SUKH-1 group includes the 
majority of the SUKH domains. We term the second 
group, prototyped by Syd, the Syd-like or SUKH-2 
group. This group is largely restricted to the 
gammaproteobacteria and firmicutes. The SUKH-3 
group prototyped by CA_C3700 (gi: 15896931) is widely 
distributed across most bacterial lineages. The group 
prototyped by SGR_4389 (gi: 182438182), the SUKH-4 
group, is again seen in several bacteria and sporadically 
in fungi. The SUKH-5 or US22-like group is present in 
fowl adenoviruses, various vertebrate iridoviruses, archo- 
saur poxviruses (Crocodilepox virus and Fowlpox virus), 
and in multiple copies in several herpesviruses (represen- 
tatives of the alphaherpesvirus, betaherpesvirus and 
alloherpesvirus clades). Members of this group are also 
encoded by genomes of the early-branching chordate 
Branchiostoma, the salmon, the frog Rana catesbeiana 
and the duckbilled platypus, where they appear to have 
been acquired from the genomes of integrated 
herpesviruses (46). Phylogenetic analysis of each group, 
along with the phyletic patterns, strongly suggests that 
SUKH domain proteins have been widely disseminated 
both within and across the superkingdoms via extensive 
lateral transfer (Supplementary Data). In light of this 
pattern, the near complete absence of this superfamily in 
archaea suggests that there could be certain specific func- 
tional barriers that prevent acquisition of the SUKH 
domain by that superkingdom. Phylogenetic analysis 
strongly suggests that the groups SUKH-2-5 are mono- 
phyletic clades. The largest group, SUKH-1 is likely to 
represent the ancestral group from within which the 
above clades have diversified through rapid sequence 
divergence. 

Contextual analysis of the SUKH domain proteins suggest 
potential functional linkages with nuclease toxins in 
bacteria 

Contextual information gleaned from gene neighborhoods 
in prokaryotes and domain architectures of proteins, when 
combined with sequence analysis, can be a powerful 
means of discerning protein function (47). Indeed, this 
method has proven particularly effective in both 
function prediction and identification of new analogous 
systems, using the organizational syntax of tightly linked 
genes, in case of toxin-antitoxin and restriction- 
modification systems (9,13,14,23,48). To better under- 
stand the role of the SUKH domain we performed a 
detailed analysis of the gene-neighborhoods of all bacter- 
ial genes encoding a protein with this domain (Figure 2). 
Consequently, we were able to identify at least three 
striking themes among the gene-neighborhoods of this 
superfamily. Firstly, across the bacterial phylogenetic 
tree we found numerous genomic neighborhoods that 
linked two or more adjacent genes encoding SUKH 



domain proteins. In certain cases, e.g. B. grahamii 
(gi: 240850988), we found tandem arrays with up to six 
paralogous SUKH superfamily genes (Figure 2). We 
found that in several instances these paralogous versions 
are not closely related and in certain cases adjacent 
paralogs might belong to completely different SUKH 
groups. For example, we found combinations of genes 
encoding proteins belonging to the Smil-like (SUKH-1), 
Syd-like (SUKH-2), SUKH-3 and SUKH-4 groups in the 
same neighborhood in several bacteria such as B. cereus 
MM3 and various Streptomyces species (Figure 2). This 
observation suggested that there appears to be selective 
pressure for the diversification of the linked SUKH 
domain proteins encoded in a gene neighborhood either 
via sequence divergence, or independent assembly of 
neighborhoods from distantly related paralogs of different 
groups. This situation, wherein multiple paralogous genes 
are linked together as tandem arrays in a neighborhood, is 
relatively rare in bacteria (49). Given that products of 
genes linked in conserved gene-neighborhoods physically 
interact, it is possible that these paralogs interact to form a 
single complex (47). On the other hand, the multiple 
paralogs could also represent different alternative 
versions of the same component of a system which is 
under selection to display diversity. Given the great vari- 
ability in the numbers and types of paralogous versions of 
the SUKH superfamily encoded by these neighborhoods, 
we favor the later explanation in this case (details see 
below). The second major feature that emerged from the 
analysis of gene neighborhoods was the linkage of genes 
encoding diverse SUKH superfamily members to genes 
encoding different types of nucleases (Figure 2). Among 
these, we observed multiple linkages in distantly 
related bacteria, such as B. thuringiensis and M. marina 
and S. griseoflavus , to genes for nucleases of the 
metal-dependent NucA family, which includes the 
well-studied S. marcescens secreted endonuclease (50) 
and the Anabaena non-specific endonuclease NucA, 
which degrades both RNA and DNA (51). Another prom- 
inent linkage observed in several bacteria, such as 
M.infernorum, various Bacillus species and TV. mucosa, 
was to genes encoding proteins with a HNH superfamily 
nuclease domain (Figure 3). Sequence analysis showed 
that several of the HNH domains were related to similar 
nuclease domains found in previously studied bacteriocins 
such as pyocin AP41 of P. aeruginosa, Klebsiella klebicin B 
and colicin E8 of E. coli (52). These linkages involved 
members of both the Smil-like and Syd-like groups; 
thus, despite their diversity, potential functional inter- 
actions with different types of nuclease domains are 
a common feature of the bacterial representatives of the 
SUKH superfamily. 

The third major linkage we observed was between 
SUKH superfamily genes and those encoding gigantic 
bacterial surface proteins with repetitive motifs such as 
the hemagglutinin-repeats, RHS repeats (YD) and 
another previously uncharacterized a-helical repeat 
motif. All these proteins showed a characteristic feature 
of possessing a highly variable but globular domain at the 
extreme C-terminus of the protein, downstream of the 
repetitive region. These proteins also usually contain 



4538 Nucleic Acids Research, 2011, Vol. 39, No. 11 




Nucleic Acids Research, 2011, Vol. 39, No. 11 4539 



SS_HNH(2QGP) 
2QGP 

AMl_0143_Amar_158333371 

Bsel DRAFT_1864_Bsel_163763013 

CHam_238897529 

Krad_4415_Krad_152968354 

A55_1901_Vcho_153212046 

Tsp. .170293906 

mcrA_Pmar_3 32 39484 

Tmzlt_2411_Tsp._217980188 

B747L_PbCVN_157953051 

cur_1409_Cure_172041089 

Cyan7822DRAFT_2096_Csp. .1962 56401 

KA0T1_06 572 J<al g_1637 56180 

LJ0084_Ljoh_42Sl8170 

CaciDRAFT_76680_Caci_2292 52068 

Mmcs_3606_Msp._lO88OO572 

AnaelO9_2650^\sp._153OO5510 

NamuDRAFT_10430_Nmul_229222233 

Consensus/70% 



hhhhhh eeeee 

22 ELRRSQWWKNRIARGIGH CG- 2 -FPPE ELTMDHLV WR- 
8 HAQKYAIWLRHGK--RCYVCT-2 -LRLVEVTIDHVI ERL- 
180 WNVREYVFFRDNH--RCQHCK- 2 -S KDKILNVHHIESRK - - 
494 YGVKVNLYKRQKG--YCPLCD-3-DNGEQLHVHHIQ KAE- 
228 ARLAPGLRAHQDG— RCF CA-1-PLPAQVHVDHVL WSR- 
2 SKQRKLVWEKSGG--KCW CG-2 -LPKNGWHADHYY IVR- 
72 DRDLQKLRFRLGD--RCA CD-I - PLGG RG H LDHMT VAK- 

19 RQWRK5LHSYTAK— RCI CG NASESICHIL RSK- 

228 RERASSLKRLVKKTKDCP CG- 1-ELGVDPHLDHIY VSK- 
215 ELTRVQFNNILRS--ECH CH - 1-TPDKTFGIDKLF DD-- 
123 ASLWPVLLRDGS- - ECR CG-B-RHEGGGTFDHRDVEA-- 
5 KKVLDLVRERANY— RCE CH-4-LSTSPLTIDHIL KSI- 
60 PTVKEALIKMSIG— KCA CE-3-LVTYVGDIEHFR KGE- 

151 FVDRERIRHKYNY--RGQ CG RRGTSVDHKD VSL- 

50 PHVRPALERMAPGIARCM CG ENYGTDIDHYR KSL- 

35 D LTAAQWS AL KAAWDGCA CG- - -ATEGVMQRDCVMAISR- 

71 EQMRTKAAL FGN K CL CG GKYEEDDHFI LER- 

15 RAARLAWLDRDGA-HCLWCG-3-TTLIRPTTDHLV RVK- 

. . .b. .lbbb. s. . .cCbYCs hphDHbbPbsb. 



eeeeehhhhhhhh 

GGKSTRGNW PACKE CNN RK KYLLPVE 87 

20-FQINGYENWLPAHYRCNQEK5GSIFEF 91 

TGGDS PDNLL.TLCETCHK KIHREGL EH 242 

— GGONKLANLRLLHA: CH QLHS KG KM 558 

— VAIDGVRNLVVADPRCNGDKLASLPAL 290 
23-PELDKIENLVPACAPCNNFKHSLSPEQ 
- - -GGRDDI DNVTYCCESQ 
- - -GGLSITEBCV PACLSd 

---GGLSVIENLVWCCSACN 

GYVLDNCVT CAS CN;- AKWDKHPEE 274 

ETTTENYv V CRGCNs LRADFDDPD 189 

GGTDDINNLAL ' CR"CNERHYNSFQAL 70 

15-WLAANWDNLLL CRN CNOK5KQRTA5D 139 

SHDNSFDNLILSCSECNKIK-SNMPYQ 209 

-2-LHTFDWLNHLL - CSACNSSHKGNRFPL 114 

GGRYTLDNWP CA'.CN i SKCNDEVTG ~" 



. .sh .ssb-Nbl .tC.pCNp.+.sp.s.b 



MQAKHAKTVEE 134 
INGKKSD TEA FY 78 
[TVK5DKGLMQ 292 




HNH (2QGP) 



5S_NUCAC12M83 
1ZM8 

Bcer98_1959_6cyt_1529 75718 

BT9727_3071_Bthu_49478143 

BCG9842_Bl945_Bcer_218898359 

Plum_37524547 

BT9727_3074_Bthu_4 9478146 

BCAH82O_2214_Bcer_2189O3330 

VPMS16_3874_Vpar_2 54506946 

EUBVEN_00679_F.ven_154482981 

MAB_1555_Mabs_169628645 

SSEG_10200_Ssvi_297204953 

SgriT_010100006617_Sgri .256812841 

M23134_00932_Mmar_124010338 

M23134_0092 7_Mmar_124O10333 

bthurOOlL59160_Bthu_228925036 

SSAG_02690_Ssp._254383033 

Consensus/70% 



S5_GH-E 

MhaB 1_mc at_ 146 7 442 64 

Sball83DRAFT_0358_5bal_304408901 

Srot_0310_Srot_296392744 

Ccel_1492_Ccel .220928916 

Hbor_07450_Hbor_31312 5518 

Hbor_04190_Hbor_313125198 

HMPREF0183_137S_Bm!:b_29539S685 

yqcG_Bsub_16079639 

EUBREC_3327_Erec_238925670 

JDMl_0336_Lpla_254555 505 

Hbor_04090_Hbor_31312 5188 

PputGBl_2758_Pput_167033757 

NEIMUCOT_05524_Nmuc_261365159 

BC059799_3279_Bcer_196038184 

Chy400_3470_Csp._222526699 

BURPS305_6452_Bpse_134278791 

bcereOO12_30000_Bcer_22914585O 

Geob_3195_Gsp._222056277 

Consensus/70% 



SS_DHNNK 

bthur0013_67390_Bthu_2289 12685 

D5M3645_26154_Brnar_87310025 

SCH4B_4841_Sisp_259419461 

METDl53G9_Mext_254563696 

AHA_1078_Ahyd_117617679 

HMPREFO2O4_12296_Cgle_30O776578 

CAMGROO01_1113_cgra_2 57460346 

C"locelDRAFT_012 5_Ccel_2422S8824 

Acid345_0865_CKor_94967896 

ERE_3 510O_Erec_291529662 

Dd 5 86_14 4 7_Ddad_2 71499995 

Swol_2069_Swol_114567580 

bcere0021_8100_Bcer_229089974 

Pputw619_2062_Pput_170721243 

Vl2B01_02945_Vspl_84386432 

MLCBl243.02_Mlep_3150216 

Bbac_42S23282 

Mflv_5559_Mgi 1.145226154 

consensus/70% 



QON FR-PDKTLPA- 
TDKLQLTEREKRL- 
AG E L I LQKG KRNT - 
EGSLQLGDGKRNN- 
ETSLHLSKNDRNT 
SGELTLEMGERNS- 
QGKLQLGQGKRNS- 
A E S VN AS YKRSN S - 
ADELKFKTHDGRL- 
TARTTPDMIGTGA- 
NAVIT RDM L DEGT - 
QKDLTTKAHPGPG- 
DNHVEGSTPKDKP- 
GSELKLHEFGRGV- 
GNNLKNVGREDDW- 
GS K PKTGTE ED P L - 

.splphs. . cpss. 



--GWVRVTPSMYS 
--5HSKNTPGKVK- 
--GMQVAAGREDR- 
--YAQRWGGNDR- 

YQQCKAGKCG- 

--KHQLAAGGEDR 
--THQV VG DR- 
8-DYAQSTLDDYR- 
--WNPNTADK--- 

1 - AGKAK P PGWVD ■ 

2- RN5LEPPGFLG 

3- GTSKRPPGYEE 
--DHWNKANRRRP- 

-TQGDT- 

6-LXKYQCGDELY- 
6-DWAKRYAGYLG- 

. . .b. .psstp . . . 



— G5GYAR HIAPSA- 

GQDHA H AGDR 

--LPDDDG H IGTQ 

— LDDDDG H IATI 
--NPGDEG H IASI 
5-TQGDHG H IGTQ 
--NKGDHG H IAHI- 

— SSGYOR H APSA- 
— LPGDOA HIFADQ 

— IsSGFAR H FARQ 
1-OYNQAR HMLARM- 

■4-GGDPEN H QPKL 
— AGSFIR H INHH- 

— HEMHDRAH ianm- 
1-DNDLDR H VRRR 
4-ONWRNACH LAKQ 
. . .SS.C.GHLbspb. 



hhhhh 
6-EONAAT-; 
— FG PK- - 

— FH D- 

— FK N-- 
— FG P E- 

— FN PL- 
-FG L- 

3-ftqesm- 

--FG PE- 

— LG KE-- 

— LG d-: 
—AG O R-. 
--IG P Y- 

— LM y-; 
3-WGNSAE-l 

— lsIsge-: 

. .bsGttb.. 



eeeee hhh 
WPQTPDN-- 
LVSQLSDV- 
LVAQNKQI-- 
LVPMNSNL- ■ 
WPMDGNL-- 
-IDNIVAMNGNV-- 
MVPMNGKL-] 
^SPQLPGF-- 
LVSQRSTL-- 
IVTLIQRP-: 
LFTITQNP-I 
LVPQHART- - 
LTP3TGKM-' 
LVLTSSTY- 
ASPQHKKL- 
LSTCSRAA- 



hhhhhhhhhhhhh hh 
■-TWGNLIDYCRELV— SQGKELYI 
--WKKV DKWAAAl KL PP TV 
RS--2-EWYNM KEWANALGGKPPKKJtV 
ar. EWKKL ufwamai — umnkl^pv 



KG AWKQM MSWAKEL--KAGKKJDV 301 



1 LPN NYQH — 
! NSQ QFID-- 
) TDN KYYQ-2 
) GAN- TFFD — 
! GIDDKVRD — ■ 
I GP^ WFC-- 
! EKDEIYVI-2 
L DST SVKD ■ ■ 
' RMI KVLMD-1 
> VKD VLID-- 
) DED> NVYD — 
L KHA KVMT — 
) IPE AVFK-2 
) TVR KTEF-2 
) NEE EMIC— 

' SQS NTKAC 

t NA5 KMIC — 

I R LTRVWAC 

. . sG.hbs. . . 



KDTREIVI- 
-QNGQVL — 
DGTLYPVN- 
ANTLLPIE- 
PNTEEVLT- 
AVDEVP D- 
DGNWRSVK- 
PLTKQVMK- 
LEIQIQVR- 
RNTKQVIH- 
PNTGERLE- 
-ADGDKVY- 
DGKKYSL-- 
NNKWRPL-- 
FTCGRTM-- 
PTCGKDVT- 
PTCGKDIP- 
EDCKQPVL- 
.ss.psl.. 




AYKKL SAWAKAL— RGGKE FV 

SY KL I; WKTA! -- -AEQ YV 368 

GWRLL EHVRDVT--NEYNE YV 157 

.--6-TYRAM KSWADAM— NNGKK TD 116 

VMSGF GAVRTAV— EGTKD PG 1127 

' DMRDH OPIYMAV— QHNGE. IT 1198 

--1-IRDGVEKDIRTSL--EAGRK AL 3379 

HEKYF RHIKPW-- -LTEKkJiR 334 

DMKDK EDIRDEL--AKIKGLIS 605 

TWLGL DYILKNAK-NFNLK TV 492 

I -11-NMYTHETKVKEAI - -DAGHV HY 3465 
. bsNlssbpsph. . . Nps. . . .sbc.bEppb.ssl . .p.sccVhs 

# # 



NEWANAL--NDGDK R 



ihhhhhhhhh hhhhhhhhh eeeehhhhhhhhhhh 

JT-Y-GREHRRLVLAAEQTGL QTQFNDFINSRPDYFRLEMASDNMGHRNEKPGSDDLGE 1 
MI-A-GHENRRILSAADELGL QSQL NDYVN ARPQ FFKIEDAK KN L HADEIPGKDNIDH 
ll-K-YSEWARLRNDAIAKGW ERQLQEYIQ-NHKLYQIEDAYGNL HQYEAPT PSVAEI 

■V-Y-GREFWRERDKAEDKGW QSQFNDYMN-NSEFYOFEDPX SNR HQFEDKSK 1 

iK-K-QREYRYLLEYYQQGII EEEFVKEYQ-NPDHYQAEAPRENMGRRHEGVGRYW KQK 
Jk-p-rreywylveyylrgii REEFLAEYR-NPDHYQPESHRASS GEHEQEGEYWAEK 

H HR-E-GHEYRKTYKDYMSGKI KDEFLKTYR-NPKNY!VE:'P>RN; hvdedksdq 

l-EPWDM HK-P-GYEFRKHQQSAMERNI RKQFLDEHN-NPDHYQPELPSSNK hkgedmiddyigu 

l-GVWDM hi-p-eakysemheavmngel TK EFVDWYN -DPANYRLELPSNNR HKYE 

i-G: V F HL-P-GKSYXEMFNLYKNRKI LKELKDFQF-NPKNYRLETPSANR HKYETR 

(-GQWDM HK-P-RREYRKLHQKYISGLI KEEFLNRYR -DPDNYRPEK PKANR RKYEGDE 

i-PRLTiaNKP-WEHWNEVGYN S RAERNDFW-DTGNMSLKLRSANS EGAKMGASGVRYR 

--RNAHM HNPEDAVSYWNRCGRYHG--AKSQAVRKWML-DPNNYRLEYGPGNC RGAKSKERYKKPR 1 
-S A MAHK -IDAVTWWN EVGRKYG- - PKSKEVRDWML -DPDNYYLEHYS KNR EGASLGQTYL PPD 



-TKPDLDfT PPWRDRVARMFG--W RKCV LDEYN -DVTK L RAQCPS C|( R- 

'-RDWDVDiQ PPKSTRDLSG M RDOJI/IDEYN KGTRLjCPSCBR; 

-IGY iLDHY-PDTWAERVASMKTGEVKS RKEVLDEYM AR LRVQCHE CNI 

.-EGVQI HR RKWRDELKHAGV--I PSEAKAAFN-NLRNLRVECSTCNQ HDWE 

. . phcbGHb pabpbh . . hb . . . b-ppph .cbbp . ssppaphEpspsNpS 



R KFEQGASVE - ■ 

RGAKPAGQ 

HKFEGIEGTVIGE 



1 ETEDQALPGKTF- 
i GYIDPR-TN KWVK 
I LKSAKG-KCQLCG 
. ALSTLY-DANAWG 
i LLHLLF- PEMNA- 
. ILALLY-PN LKYK 
1 VLSILY-PNLDT- 
; NPDKRRNDGKY 
! DEYSAF-SGQPGK- 
I KYRCAY-CGRKFT- 
5 SGVDGY-TGKTTN- 

i EARKPFDEKRDKG 
! SIYDDY-TGKQLS- 
I TLQDAY -TG K TF A ■ 
I FVEDGYNKGNQTS- 
i TLHDPY-TNQAIA- 
' HWNDPY-AGKEW- 
1 VLVDPY-TAETIH- 
.h.s.a.ssp.h . . 



#% 

eeeeeehhhhhh 
--IGSLEADHIVSMDRIA 
1-EGN LAADHVYPK5 LIE 1 
4-IRPIDIDHIIPRSKAN 
--SAPHHIDHIIPRS LAD- 
--TEVFHIDHLHPRSHFD 
--DGDFNLDHCHPISNFN 




6-gqrisadrphcpa:lw 
1 -akeioi dhmvplk n ay 
5-paavqvdhvfalgr5w 
. . s.shphDHlhs.ppb. . 



hhhhhhhh 

1- MDGFGNLTEKQQLEV- 
--KLDDFDLLTPQQQEF- 
— KDGKVRLNGQLID 

2- ALMAQNLSESLIERI- 
4-AKVDALLNNSERMEF- 
4-KLNNIQLTDSNREYF- 

EYKKAMKENKKECY- 

2-FKGNYALSTDDIKRI- 
1-WDGFWELTEDSQKKL- 

1- 5KKARKLAKKFGIEN- 

2 - KKVHLAL ETGEGVEK- 
2 -PAANAHLEKQEQIDF- 

2 - DEARLYMNDEERNKM- 
— DD PARVLANRDGVEL- 

3 - TIVKAGTNDDQLLDV- 
1-DMGAYSWPDRQWLRF- 
--VSGAWQWDYKTRCLY- 

■ - - DAGAAHWDL PQRVAF- 

. . . .s.h.bspppb. cb. 



— LNNPE 
1-LNYPG- 
--VNDPE 

3- VNRLG- 
6-HNTIA 
5-FNSTT 

4- YDALP 

■ ANSND 
--VSYEK 
--VNDTK 
4-ANADE 
--AHGKE 

ATS EE 
--ANQRS 
--VNDKD 

■ AND! A 
- -ANYMG 

AND PL 
SNS. . 



eeeeehhhhhhhhhh 
- -N FTGLSKSANTSKQSKSY 
- -N FEPLPK WN55KLNRLA 
--N OA LC F SCN R A K RDK D E 
--N QLLLARENLEKSNLPF 
--N HLLNDSQNLSKSDRPL 
--N CJ4LDGNENKSKGAKQL 
— N QLLDANENM 
- -H ALTSAALNGf 

--N FLMEEPLNS: 
--N VAACRSCN51 
--N VTTHK INR 
--N KDLDASAN5SKSDRNM 
--N AWTNSSLNQ-iKNDKDL 
NATDGSVNTSKKDKSA 
--N IFTNS VNR.KGQVPL 
--N LAVDGQANPEKERFAT 
3-H VAADAbANM KGD.'AP 
--N IAVGGAINREKSDSGL 
. .NL.hhs.thN. tKSSpsh 



570 
535 

1DKSL 535 
ILDSSN 241 

ISNRTR 1149 
IGIK-M 168 
iGAGEL 167 
224 
160 




NucA(1ZM8) 





SS_WHH 

CBGD1_46 5_Cbac_2 544 56 771 
Plum_37528111 

HMPREF17O5_0062 5_Ahyd_289522 718 

Pcryo_0919_Pcry_93005747 

BacD2_010100001910_Bs p. _2 60170592 

FsymDgDRAFT_2820„Fsym„289643442 

Ccel_1494_Ccel_220928918 

BC059799_3281_Bcer_196037889 

PROSTU_04835_Pstu_183601201 

BCAS0663_Bcen_197295500 

BC33Ol_Bcer_300214O9 

yokIl_Bamy_154686414 

Cdip_382 34225 

Bcepl808_0452_Bvie_134294564 

NEISICOT_O3571_Nsic_255068528 

SSE37_O34O0_Sste_126732987 

VP1517_Vpar_28898291 

VMB_03870_Vmim_258624136 

Consensus/70% 

SS_AHH 

SAV_286_Save_29826826 
Hoch_0576_Hoch_262 193898 
5ent_16759281 

PM8797T_26945_Pmar_149177115 

PM8797T_01924_Pmar_149178421 

CA_C1644_Cace_15894921 

ESA_03913_Csak_156936O19 

HCH_00612_Hche_83643505 

CCOAO 104 _Cc ol _S 7 5 0 5 10 3 

Bcen_2646_Bcen_107024190 

VS_IlO979_VspV218676740 

AB 5 7_144 3_Abau_2 1 3 1 5 6 3 99 

CLOLEP_03399_Clep_160934527 

TERTU_4147_Ttur_254787989 

EC5E_0631_Ecol_209917822 

yeeF_Bthu_49477375 

LNTAR_18398_Lara_149195890 

P700755_1984Z_Ptor_91217504 

Consensus/70% 



SS_LHH 

PROSTU_01984_Pstu_183598586 

HMPREF0660_01845_Pmel .288803407 

DSM3645_27538_Bmar_87310254 

EUBVEN_00689_Even_154482991 

GCWU0OO182_01192jdef_229825829 

SMU.684_Smut_24379157 

VIB_J)00150_vmet_260771711 

HMPREF6123_1462_Osi n_2278732 54 

EATlb_1090_Exsp_229916817 

BT9727_0833_Bthu_4 9476990 

HMPREF90 19_0 118 _Pt~in_2 828808 62 

HMPREF0660_02349_Pmel .288803920 

Bcer98_1961_Bcyt_152975720 

SSU98_0374_Ssun_146320221 

PM8797T_O2049_Pmar_149178446 

rhs-core_Pl um_2 7497168 

fhaB_Btri_163867647 

HPSH_O52 35_Hpy1_188S27807 

Consensus/70% 



eeee 



) YIGKNPEGY HHDGNP 
L FRGTSPKNM HHENQP- 
1 GKNPKGLGFVVHHSERP 
L ANWRKENGY HECKDC- 
i KKGKLPQCK MHHCPET- 
8 PGGATPAGW AHLPVL- 
) ERKSIKENY HHADDF 
t RAMVENGSY HHTEVP- 
! TPMLDGERYL HHLDDY 
) GSPADYTKN HHHENM 
I PNGPKLDEN HHHQDL 
' KWLKEWRL PHHLDN- 
) RRYRKEHGLV HHHQDT- 
i YKSSAPPGFS HHH PGR- 
' GKGK KQGL HHHEDT- 
' DNWTRPEGY HHNENG- 
i ITKYDKSKYV HHHQDG 
> APRRPPEGF HHEVED- 
. . . .p. . . bTWHHp . S . . 



-G L I3RYDH S 
-GVT V SE TH E 



■QTM 
■5-IVMQ 
■4-RRIA 

-6-C5MQ 
■ — GKMQ 
-6-GTMQ 



PS VH 
DRDIHH 
PVELHGA- 
QK: AH; A- 
PYGIHGI 
VS SH G- 
VTMQEIPKDIHSE' 
TTMQE SKEMHRR 

NT U i IPM; LHGN- 

— GRMQ INK VH A 
■1-GVLQ PMSQHQA- 
— KRLV DYNDHRR 
— TTM0 PQSIHST 
— KTMM IPK VH V- 
— GLMR PREQHTP- 
. . .pbpLvsbphHp. . 



hhhhhh 

--VRHD FLLWTEREQK 

3- IYHPD SCGRDKWGGG 1 
— VHHV FERTRNLSAS 
--IPHE ISVAKKQLLI 
--TKHT SATLNIENSY 
--YRHL VS TGKADRAR 
1-YTHK SC QY: : F YHNF 
--TVHN RTKGMWADAP 
--IQHK VSQFKDTNNV 
— FTHR VSNLKNQSSC 1 
--FRHM MSLAKKLKD- 
--IPHV SASDLRGGF-- 
--VKHT YAIWSKPLNV 1 
6-VLHP QiGKQNWGGG : 
6-PQRK RSIWGGGRPG 

5- ATHT AA LYKGSHKS 

4- VAHT RSVIQHNLLN 1 

6- VLHPN RGGYSIWGK- 1 
..h.H.GG.t.bp 




WHH 



^TAAHHIV/ 

hqaHvtf 



hhhhhhhhhhhhhhhhh 

2248 SCSTNAQILGDNM--TAAGTIRP---AET; 

NPLRKAESVGELL-ARAGHTARP PY HQ* 

1226 PIGLRGGLNLYAYC PN PLTWIDP LGLDI 

157 HNKCNTSKHSICALGQDLTNNKIPRPQGGYQAAHlVPTNN-10-AIKTAQI<KFNTYLGK-2-RDTNlSG FWAKA GHA-CTHTDE F L EL GKAFQSAN S 

323 A6EVAPSNWGQHI-ISETGVLPPAGMPRSHEHHINMKAG--6-FVEKGKDILEFYDIP-4-TGPTGNLVYAP NVAGQHTTENATKLYNELLGVHR 

191 IGDLSKFNFGNYL--KKKRGNPPSGMSKFHAHHVAVKGN-10-YAKYSSEVLKSVGID-1-YYGIENIHWAP NE--GHSNR TKAIAKEFYEVKK 

1209 PLGLASRSWLAKA--LAfCHVIPAGMTNPHGHHIVFKGQ— 8-HLSRSKAILKRYGID-l-VNDPANLMIAN NGEGVMTEKNAKKVADALVKADK 

16 EMDNVNESLPNEI--AFNKNA\GTG-ASVEGHHLIPTEV--3-FEGFFQEIADHSEGH-4-qDNP! 

Ill KIQASPLAKIKYDRIRWEGGGDDNKQSSIOTHHIATDKN--2-FTKEFRKITKKYNME LDEDl 

444 LGLAPRRPAHEASIQRRVNCLKS QGFQEHHlISDKN SQTKNH PLIALAGFD LQSQ* 

28 EEGLDA FQELKALGVTEKDKMEGLE-KNLQAHHirTVNE- -2 -NNDFLFKKLPYLGYN LNDWI 

1489 TWIDSKGLCSTTLN-RNLGGVKG-— DHLQAHlIPEE IWAKRKDFLDDIGIG-2-RDKAENGVLMPD-10-QLYHC-G5HPI- S AGI NQKL GQIQ R 

145 DSVAIGGEIAVEQEATIAKPSKPR--NPNDLHHIVPKAL--3-RIDYARLRMEVADI.^ LNDPVNLT SLPR TFHA-RMHGNS R: MVY IFW' N C 

107 EEHHPTDDLEMYM--RAEGVPKPS— SQHTAHHICPGSG--5-LIRNTI IHM iSHGV INDPANGVYLLH-12 -SRGHL-TYHTRE EK LVAGRISTLPS 

llIPKE LWNHPVLQKIKYD— IDKATHGIFLRK— 7-MARHQ-GNHDG- TQVTKDALDKIDI 

IHIVPWND- - 1-RAVE IQKLMKEFEID PN SAANGVFLPG-11-EAMHI-GNHGPE IELVYNTLTEIKE 

IHIVEK.N DNPI AVAILTRNGIH VDEAANGVWM LM-17-G PYHN-SSHSDT SEYI LERLE LVE P 

HIIGND5- - 3-SSVL LKQLLKTNKID INNAINGMRLPG-16-GQIHK-G SHNCK. YDAVYEIMKNAKN 



hhhhhhhhhhhh eee 

-1-KAASARAKLASVGID INDADNGVFL PR- -9-ASVHS-RIHTNN YGWNDMMSGVRT ; 

-1-RSHLLFRLARERGVF-l-HDAP ENlALLAR- 22 -LPRHQ-GPHRR- TRAVIDAARSTEL 



iVYLPK-ll-AAVHS-GSHPA- SE FV RQRL VFLQK 

KVKMP HR-GRHPNE HEYI LEKMSKIDK 

KIYLPD-- 7-RSIHN- - RHPNIVSTNLSRQMNDAW 

VVLPG--9-MPLDS-G SHPTP TDNVKRRL5 LLRN 




1469 GLVGCSTKLGKNM-MEAMGLARSTTWKGYRAI 
416 NSYTASQVLREEL- -KDAGIDPPP--YHNAAI 
288 SFGKDGRKLKVNM-EKVYGIKKL---GHM, ' 
701 AEPKSGSVLSQNM--NAVGKKYI KGTQA| 



.h...b..p..sh..s psHHllsp. 



. b . p . bbpphsbc . . .bsss . Nhl bbs . 



:.Hs. .Y.p.l .pbbpphp. 



hhhhhhh 



#% 



. NLQIVL-1-D RS MVV-KN KYiIVS HHSKQNGLG LFEL TP'HE-( 
. IYHANNN-3-Y KS KH--AN --DTII HHQNQMHLG VIEMPN^GHV-[ 
LRAMTK N FA KD--IN --KKLI HHLDQiJ PV' . IVEMPGFI'HN- - 



hhhhh 



hhhhhhhhhhhhhhhhhhhhh 



'-RGLGNt — . 

. i — ignriqbplgnapga-6-raafnawresyTkaJaleela 156 

72 ANDEIDP-3-R RA KSK-KD - -YSI EIHHDEQNPNG FKEM RTDHR- LGVN Y;;KNHPNHTQKSK- 2 - RTQWKYQQRKY ENEWN5GRW 157 

325 NLQRIM NKKS LD--PT --IPYE HHIGQSNDS LAIL HLEHM-GEGNNKIWHVKTSGFDN- 3 -QKVWATIKADF KDYAMKLIS 407 

147 NLERHE R FA ID--E5 --KPFN HHIGQKMDS LAELPDQVHK DNYSKLHANKGP5NI - 1-RVAFAKEKQEY KAHaEQIKG 225 

149 NK RVE-3-N LS IN--KD --EKVE HHIGQK SNv LAEL ES EHR-KPPNN LTLHKIRDGS EV-2 -GSGWKKEKE SH NERAGEGVQ 233 

160 NTQRVE — R LA LD--KN --KAIQ HHIGQHKD- LAEL FE EHR-CGGNiJTlLHDKSIES EA- 3-GNNWDNERQNY QN1|ADYNNN 242 

94 NLEKML---E KA T---VN --EPID HHIGQKNDG LAEL RYGEH --QGNSSILHEKRTES EI-2--NEFGKERAAH KARAEIIIA 172 

120 NLERMA E KA L---VD --RPIE HHIGQEME LAEL S: iHR-GQGNDTILHDKSKDSEI- 1-RTQFNAEKEAH KTRAAQINI 199 

86 NNADLIG— -E FP RD--AN --DPYA HHIGQEQD' FAEL WNEHM-GDGNNPILHTSRESKIY RDQFNKEKSLY QARFKAFTQ 165 

144 NKALAE D YP RD--ET --RPYE HHVGQ'.PE LAEL YDOHH -CNGN FTKLHTFDESSID RQQFNKERKEY ETRSQTL-- 221 

84 NRELAA K RP FVADKN VETQIE HHL LQK EPGDMVEI FATTHD-- -EYKKILHGLIKDGDS- 7 -DKQYNNLRKKY KWRFNNLD- 171 

129 NAERLQ S LL I---KD --QPLEVHH1GQ! 1 NG 'HYALL Kf-![ : HI-KNGNKE.MLHKPGKSDVD-l-GLNI_ARDKRSIAi;VLME EQV- 207 

229 EN LMLMQ G NT FARNAA EWE KL N HHVGR-QD. .MHIEI ISS I INAYNPTTGGVLHISGPGGP VRQSRLSITY "QQRLEDMID 310 

1390 NIERMA— S RA VG--HD --KAVN HHMLQTQNC IAEM QTFHK — -VNHKAlHlNPNTIPS- 3-RATFNKWREQY INRaGDFK- 1469 

2403 NVERMR ARRA IG--FD --RPVE HHLSQTPEG IAEM YE FHK KYTSVIHNNPQKHQS- 3 - RKKFEKQREEY KERAKGYRE 2483 

60 NLERMQ E NA LC--KD --MSMfc HHRQDD IIIEL STEHK KYYKDLHLSKK ES EI - 1-RSAFNAFRRNYYXKRAKELEN 138 

^Nbpbh. . . . pGbsPbs . . . sG. . cslpLHHlsQp.stPbsEl 1 ' . pphib. . . . spphlH. . .pps p.aspb+ppaWc.Rtppb. . 




Figure 3. Multiple sequence alignments and structural scaffolds of the distinct families of the HNH/EndoVII fold recovered in SUKH neighbor- 
hoods: HNH, NucA, WHH, LHH, AHH, DH-NNK and GH-E. Their secondary structures are indicated above the alignments (V in blue, p-sheet; 
'h 1 in red, a-helix). The numbers in bracket are indicative of the excluded residues from sequences. L hash' indicates the residues involved in metal 
ion-binding, 'percent' symbol indicates the conserved histidine which is required for activation of the water molecule for hydrolysis and 'asterisk' 
indicates the conserved asparagines. On the right, structures of HNH and EndoG families are shown as cartoon representations with the central 
structural core colored by structural element type (a-helices in purple, p-sheets in yellow), and key catalytic residues highlighted. For those newly 
identified families, inferred topology diagrams of their core nuclease domains are shown with conserved catalytic residues. 



4540 Nucleic Acids Research, 2011, Vol. 39, No. 11 



certain domains related to adhesion and the two-partner 
secretory (TPS) system N-terminal to the repetitive region, 
such as PAAR (PFAM: PF05488) and the TpsA-secretion 
domain (TpsA-SD, also known as the filamentous hem- 
agglutinin FhaB secretory domain; PFAM: PF05860) with 
a pectate lyase-like fold (53-55). Some of these proteins 
with repetitive domains, which were recovered in our 
analysis of SUKH superfamily neighborhoods, are repre- 
sentatives of toxins of the CDI systems (Figure 2) that 
were reported even as this study was being prepared for 
submission (24,25). Like the above proteins, the CDI 
toxins are characterized by multiple N-terminal 
TpsA-SD domains and hemagglutinin-repeats combined 
with polymorphic C-terminal domains that vary greatly 
between different CDI toxins. In all these CDI proteins 
the polymorphic C-terminal domain is separated from the 
repetitive region by either or both of two small a-helical 
domains annotated as domains of unknown function in 
the PFAM database (DUF638 or DUF637). Furthermore, 
it was shown that the protein encoded by the gene follow- 
ing the CDI toxin was an immunity gene, whose product 
provided resistance against the toxin to the cell that was 
producing it (25). By this criterion it became clear that the 
SUKH superfamily genes in the CDI operons were 
actually immunity proteins for the toxins encoded by the 
upstream genes. However, in contrast to the pan-bacterial 
distribution of the SUKH superfamily, the CDI operons 
were only observed in proteobacteria (25). Furthermore, 
we observed that polymorphic C-terminal domains of the 
CDI toxins, which are found linked to the SUKH super- 
family immunity proteins in CDI systems, are also seen in 
bacterial lineages outside of proteobacteria, where too 
they are linked to SUKH superfamily genes. In these 
cases they are linked to other N-terminal domains that 
are distinct from the TpsA-SD and hemagglutinin repeat 
domains. Studies on CDI systems indicated that the toxin 
function resides in the polymorphic C-terminal domains 
and at least two of these domains are nuclease toxins that 
cleave both tRNAs and DNA (25). Our above observa- 
tions indicate that outside of CDI systems, the SUKH 
superfamily genes are linked to genes encoding the HNH 
and NucA nucleases; hence, it is likely that even these 
nucleases function as distinct but analogous toxins that 
cleave nucleic acids in target cells. Together, the above 
observations raised the possibility that the SUKH super- 
family protein might serve as immunity proteins, not just 
in certain proteobacterial CDI systems, but also more gen- 
erally function, across all major bacterial lineages, to 
protect against linked genes, which are predicted to act 
as toxins. 

Interestingly, in addition to gene-neighborhoods with 
multiple tandem divergent SUKH superfamily genes, in 
several bacteria, we also observed notable lineage- specific 
expansions of SUKH domain proteins (e.g. 21 paralogs in 
Gemmata obscuriglobus, 20 paralogs in C. gingivalis and 1 5 
in S. albus). These observations also make sense in light of 
the above toxin-immunity protein hypothesis: while the 
SUKH superfamily gene adjacent to a nuclease or CDI 
toxin gene is likely to provide immunity to the 'self toxin, 
the supernumerary SUKH superfamily genes, which occur 
as tandem arrays or as isolated versions, might provide 



immunity against other 'non-self toxins delivered by 
competing bacteria in the environment. Such associations 
of multiple distinct immunity genes have also been 
observed in the case of plasmid-borne colicin gene 
operons (8). Other features of the genomics of the 
SUKH superfamily also support this proposal. Gene 
neighborhoods encoding SUKH proteins and linked 
nucleases or CDI toxin are highly variable in terms of 
being present or absent between different strains of the 
same species or between different closely related species 
which share an otherwise similar genomic organization. 
Secondly, there appear to have been recent duplications 
of entire loci encompassing these gene-neighborhoods 
within the same genome in several bacteria 
(Supplementary Data). This kind of phyletic and 
genomic polymorphism is also typical of loci involved in 
inter- and intra-genomic competition such as toxin-anti- 
toxin, restriction-modification and virulence toxin systems 
(6,9,10,15), suggesting that even systems with SUKH 
superfamily proteins might have comparable roles. To 
test this proposal further, as the first line of investigation, 
we aimed at exploring further the link between nucleases 
and the SUKH domain proteins. While the polymorphic 
C-terminal domains of two CDI toxins have been 
characterized as nucleases, the C-terminal domains of 
those CDI toxins which are found linked to the SUKH 
superfamily immunity proteins have not be characterized. 
We speculated that these domains, along with some of the 
other uncharacterized domains in proteins encoded by 
conserved gene-neighborhoods containing a SUKH super- 
family gene, might be as yet uncharacterized nuclease 
domains. As a second line of investigation we sought to 
uncover those among the associated uncharacterized 
domains, which might have a role in distinct toxin- 
trafficking mechanisms, comparable to the two-partner 
system used by the proteobacterial CDIs. Therefore, to 
accomplish these two objectives and identify other com- 
ponents of these systems we resorted to systematic 
sequence analysis of the uncharacterized proteins 
recovered in the above gene-neighborhood analysis. 

Sequence analysis reveals the presence of 11 distinct 
families of nuclease toxins encoded by genes 
adjacent to those of the SUKH superfamily 

Sequence analysis indicated that at least 11 distinct 
families of domains recovered in our searches in proteins 
encoded by genes adjacent to one encoding a SUKH 
domain protein are potential nucleases. While some of 
these, as noted above, belong to the earlier characterized 
families, several of those identified here belong to entirely 
new families or are highly distinctive previously unrecog- 
nized versions of previously known families (Figures 3-5 
and Supplementary Data). Identification of this diverse 
panoply of nuclease domains as being functionally 
linked to the SUKH domain lends critical support to the 
proposal that this domain functions primarily as an 
immunity protein against nucleic acid-targeting toxins in 
bacteria. We briefly describe below these newly identified 
nuclease domains. 



Nucleic Acids Research, 2011, Vol. 39, No. 11 4541 



SS_EndoU_Bacteria 

SS_2C1W 

2ClW_EndoU 

P700755_11360_Ptor_9121705 5 

VOA_000911_Vsp._26240293O 

Mi cauDRAFT_1502_Maur_27049957O 

FRAAL02 5 5_Fal n_111219750 

Bpsell2_010100Q37817_Bpse_167916294 

AplaP_0101O0O23013_Apla_284O54329 

Ste rin_402 8_Ste r_2 6912 2614 

bcere00O7_5484Q_Bcei-_22917O568 

NMCC_0595_Nmen_161869 579 

wpa_0834_wend_1905 712 34 

Ctu_lp0126O_Ctur_26O4245Ol 
BPUM_1764_Bpum_157692 534 
HMPREF05 54_2271_Lgoo_2620383 53 
HSM_16 3 8_Hsom_l 70717905 
RER_22730_Rery_2263O5 760 
snas_5658_snas_291303103 
CloleDRAFT_2946_Clen_296442801 
bthur00O5_34800_Bthu_2289GOOO8 
PCC8801_2697_Csp._2 18 247483 
cbei_263 3_cbei_150017489 
PMlll20_Pmi r_197284991 
ConsEnsus/70% 



B 



ss_REase 

SS_10B8_HJR 
10B2_H3R 

StAA4_010100O33 338_S5p. .256672099 
Ndas_4622_Ndas_296847988 
Ndas_4050_Ndas._296847426 
BGP_3812_Bsp._153869032 
ShygA5_O1010O029222_Shyg_256778979 
ylri.:^^ Gvio i/il9801 
vl2G0:l_18287_valg_912 23720 
SSCG_05641_Scla_2 54393157 
veis_0254_veis_121607256 
P--18797T !H!S Pirar i-[917<,7il 
slin_4304_5lin_284039155 
AcavDRAFT_4616_Aave_2 70495845 
RSpl071_Rsol_17 549292 
SSCG_05455_Scla_2 54392963 
Kfla_0766_Kfla_284O28753 
GAU_3424_Gaur_2 26228830 
CAMSH0001_0931_Csho_2 55322811 
AbauAB_O1010O011119_ J Abau_239 502854 
BCAH1134_3O5 5_Bcer_206968666 
ROSEINA2194_03931_Rinu_22537S270 
HMPREF0670_02907_P5p._28S93O199 

AHA_182 8_Ah yd_l 176197 21 
ABSDF3544jbau_169634827 
BTH_l2723_Btha_83721505 
consensus/70K 



151 PGSRP-DSCGFE|VFV|ESKRG- 

32 KFKPE-WYSFI ASI' RIKNG- 

34 NTYDG-MVVNQH IFC EPNKN- 

16 RPDRM-NPHFTE VMgIhVKPG- 

159 LRPPI-TEKAFNBVLNBEWSRK- 

95 PNNNN-LPWDFR IF-jELDLN- 

48 TAYNN-RQVNHl IFC ELNQR- 

! 80S NK 1 [■. :\C,LNN FTEF F Af<- 

50 YDRDV-TTS5VAH r FL GI: KA- 
271 EWENK-TGLDFK FIG .B:KK- 
124 NSGTEKETIGFC IFCIEPDRL- 
179 KGSQT-ANIRRD ILN|eVKTL- 
401 EKVKY-SSSLED IRKVDTNVP- 
129 GEATT-NYNNFE ITElEIGVD 
575 MVSLV-SSQRAE ILYIDICT — 
281 PHIDI-TARDRR ILMEPDNI 

88 PNLGD-TTQRRGHILI 
4 SPYEF-LDDTVE IFNBKVtK. 
404 ANSTI-TPEMELKILW QRKNI 
226 TDAWFGSGVGFQHIFCjlEPSRf 

74 IPHTI-SSNAKTHLIAENKPI: 
L552 TSSFI-NDDIIN||SAKjlDWKEi 
. . . p. .ss. . bpHlb.Gc.p . . 



-1-EMM LHNWVQFY- 

NVT VHFYNPER- 

-2-KAV FHSMPDNN- 
-2-KGS YHYRPGGE- 
-1-KPV' FHSAPEGV- 
-3-PLR FHINRPGR- 
-1-RLV' FHSRPEGR- 
-1-EIS SHVESVFT- 
-5-QPV MHTITPGS- 
-1-TVT GHSLTRGD- 
■GLG MHFVGRYV- 
■VGS GHYLRDPN- 
3-GIG AHNKDEFM- 
-6-Rj/I GHKAGNR-- 
IKFiFTN- 
IRSGVG-- 

5-YVT WHYQAEFN- 
-3-EII GHSSGINN- 
-1-NIG FHFAGHYL- 
-3-GVT THNEFEFY- 
1S-LSG GHGQSAIL- 
hschH. . s.h . . 



-6--NIDYKGYVA- 6 

VHIKILK-- 

-2-STFKSSETSS-- 
-2-PGRRLQPGSV-l 
-2-EGRRVSWAAE- - 

13- YLARVIPLGG — 
-2 -ATVGGFNITQ — 
-9-KVLRTDSVIP-l 

14- SDKTRAVRPL- 8 

VRVIQQTS- - 

- 6 -WAGAIWDNES - 5 

IKIDKMTG-- 

-1-NDVNILNVKK — 
VftVIEKLE — 



PDEDDQVLNL FNWsL'-iv 

- T N ET N KTF LAD F EFYD ID 
PENE FGLYSLKKIVLDFN- 
■KDPKTGAYTAkPEFFDPT- 
■ P I V LGAYGA VLFRNLS- 
-PCVHGAYQG V:jIl.ii..lJ 
APNiriGIYAG WIYQGv 
'■vi Imlvkv', iaT.i ! 
i-TGI- 'IGV I- A IV.L-.L 1 I- 

- A PO K K3VYQATV EI K K PD 
D I K P P - VYTFGM KYL G KD ■ 

- A PO A IGVSTG YI SVRD PA 
-HDAL FGVEKI rYGVPSLD- 
- -EY S GG SYEAKI EVQDSN ■ 



-1-QGNKITKLVE-- 

-1-HHNYWEEI 

-6-LASRVPRNSQ-2 
-5-TGKNIDKMIS-6 
-1-LEARILYNIE- - 



-GPDE YJTT I A'l. V EI QG I P 
-KINPIGTKNIKYTTQFPD 
E VD P GVIYTL GV IM K V DG 
DKDFQGLKTI YKITKG 
-HTYPWVRVG I-SllAi--: 

hhb.hph.hb. .s 



,= eee hhhhhhi'hhhhhh eeeeeee 
eeee hhhhhhhhhhhhh eeeeeeee 

KPVGS FIG- -VSPEFEFALYTIVFLASQE KMSREWRLE- 

-6-KKTPS F ADWNIAlLLMEIkYAYDNANF--3-NGKIKSKTYS- 
-4-EKAFS M TSC LEQINASAVYSHKNSNG--4-VNWATCGPSS- 
13-NGGES F DDW P AQ V DN AXTGAF QNAK P - - 3 -T SMWRG -TH K - 
- 7-KVRQH M DFW TLKVRGAVO;AYDEVYD-14-KTKWTG-SYE- 
-5--ANN F WNGA RiiQL LD' NDPS - - 7-TRK FIGATTPS- 
- l-QRKFS M PRC ATQVINSIGHAAANTIS- - 5-PMWALCGYNS- 
10-LNFKSAYT " WS, L LMSSG : LNIWANEA--8-DNKGYGISDN- 
-4-KQKEi M TDW QEQVVDVISRSYDVAKK--6-TETWEGROPI- 
12-PMIN M ■ WDtA. RAtV AW.- KRKt- -2-GQKWRGTSKS- 
-3-KVKCP GYAYNLHADDILVSATKAFKELGK--1-GMCLYHMEDD- 
-6-KKAET F I -OH K!0TAQEI:\AFKNSSP--4-QNKWQGVS5S- 
12-PFDK:VYD KLI DLDFIKRGXLAVNNASS--4-GREWEGYDGQ- 
10-NDGIV H DHW ENRIKVEINMLKNPQN--3-RYKWEGTSSS- 

ILNGK K::-.IHW EKKILDVISMIATDPNL-19-SRFKVEGVVD- 

RPGK E; : PDW DDDRI IAAVR ATVD EHHW--6-ATLR-RLEID- 

RFC E" KHW DDQIMDMVNDVARNPDS--8-AGYEVQGQRA- 

V ■-il'-l ■ I' KAR--4-RNCFR; v o- ■ 
-4-KIKSS V DGW DTKIINSVENIGNSKLI--6-GATWHRGTID- 
-3-AQSNIKGYGYTL AEDLLAFGTQAFHDNPT- -5-QACLLNVEDD- 
-9-KEPK VYN IY 0 N'.YQWGL AM:-. NGYL- -1-G YLICGEA5N- 
-3-SGTA', W ENW DA V AG AI ^i:-. rv V G I 

p*'bFPppb ,v . pplbs. hppsbpss s. bb.shs . s. 



EYELQIVVNRHG- 

NIEVE-LYVKN-- 

1-NSKLY-CIGKN-- 

ILVIE-GFVNGS- 
/HIR-FYVEPD- 
LRIA-GFFDRN- 
5RYI-WGATN- 
^IF :-FWTPEi^ ■ 
iMi V-MKC FV- 
VEIE-GFTEPN- 
■-NYQSVFVRKK- 
VKIQ-GYYGKH 
IKWR-GYTDKD 
l/KIR-IFLK-D 

V -VIV PM- 
--- VI -VA^YP - 

V -VEL A - 
MCIE-MFLAGD- 
VEID-VIKKG-- 
HDFTMVFVRRS 
M FM-GYF N- 
tfFIR-WRDP-- 

Ghplp.hbhp.p . 



--RYIG AY VLLST 
--GKLTTTY LUESL 

- -GKAFTTATATL- - 
- -GGFTHGW VUIP- 
--GSVRTAF VIDEE 
--EGAYRPF ECC-- 

2- GKINTGF LRLSR 
1-GDPFNFY TLGIP 
1-GEVITAY KYER- 

RTAY LYNSK 

- -DAILTFY DLTPK 
1-GKGAAAW VYGGG 
- -GAVTSFY EL--- 
--GKVTTAY IKP-- 
1-NGVTTAF IKQE- 

3- EIMRHAY VNGAG 
--GTVWSAH LDGGL 

4- ADVITAY IHTST 
— NSVISAY TGKAN 
--DGIRIFY DATPD 

- -GEVTNFH VTSFD 
--KGGGSIF DNTiq 
. .s.hhshaPh 




hhhhhhhhhhhhhhhhhh ee 

MNRDIGKM RFIV IL:-:C- G'" 7 

VRDRLH-47-GEKPGAGNFB 
ILDQFD-3 5 - SPRNGNHQFB 
1001 H5PLHKKM! D ETY EG", VAVK!.: LFD- 32-TPRSGNYQFD 
4RRMNF--5-GKPGQGKGFB 



# # 



hhhhhh hhhhh 



1303 YNAQIGNi;: 0 AK] 



A YM FYG 



LKKWKEDV MGLGLVAEI MGL 
RGDIliEQL ;.A. :1YA:: :-.Q';; 
384 GVKIKQL7S ■■ E AAI KMi TLV HKL- 



694 LMDFRSLSNPQ SIV 
863 YKE LGTE LTGES EKL 
1081 AKYPVQHPSRTGESL 



LAADDFVRSKRP 



HLGGEKVKQLLI 
HAA: DFARSKGG 
DFA SMLHWLGS 



.'05 ■ i ORG Si -(VGAV EAFRLQFEC ELG 

66 pdandaninm; gni klm .wyfq fg- 
405 lvtqiakgkdqgg e.l teqlfi > lak 'ngf 

79 sGEVALKrT H GNY EK.K.M FFE QTV 

694 gfdvinasskqkgnf.efkaddnlvnnqg- 

I . KC! Il l r, I \l: G'l A LY-V\ G^■ 
281 KVNLKILSRSKKGIY EIISDNWMANHKF- 
3947 KQLLGNKDK N VTG VL REEIADSYFKN5GY- 
3010 SQLDQLPSKDLQGQAREYVANNYFVR NGF- 

sp.p. .bGEbhspphhbpbh. , 



15-FSKSQSGDFD 
■11-TSISKPGDFj 

10- GPKNGNHQFD 
20-KAWHGRTGIl 
-5-VIRKGEQGVj 
-6-ECAGLQGEF 
-8-VYEVGKNGI 
-7-TPGVGEAGI 
-5-GAPGKAGTL 

11- LELSGAHTF 
14-IQRGGGQGF 
-4-EGEVGCNG1 
-5-GKYGSDNGF 

10- LDQKIVKGI 
18-PDDKIVKG1 

11- IDKPTGKGF 
13-NDKPRGRG1 
-3-ESKCGSNCF 
-3-DGKCGANCF ■ 

....s.s.sp.bD. 



IFATKGN 
II l-JEVIW 
GVWRTED 
QIWPLKG 
■QVYRSG-- 
QUifflTKD 
QVWKVKG 
- LVEE- 
QVWRLPD 



KIFRKGN 
RVYLASD 
GVYKNPN 
■DLYKVNK- 
LVKLKPG 
QVYVRID 
■GWVRFD 
GLYVXRD 
HVWIAKD 
AVYENSS 
GL YEN LN- 
GAYTKNG- 
GTYKNTN- 
f, YMKNi- 
■GVYVKGN- 

lab..s 



---TLL5I CK5TW- 
-5-HYIVHEA GPG- 

- - -a V S^T - 

GILIVEA SSQ- 

--- V V GG-G- 
---NLLIVEA gpk- 
-4-TFWI A GGS- 
-3- I V GGS - 

KYVWEA SSV- 

20-TYVII S ATG- 

WFIG GTA - 

---TIRVI A GGT- 

- 2-DYWAE FKYGK- 
-2-DYWVEYKFGS- 
-2-.L C GGS- 
-6-GFAIieA SVN- 
-5-TIVLVEVXNYA- 

-6- V C YMT- 
-1-VI =D QIK - 
-3-KY AE YNT - 
-5-EYIID6A FGK- 



PWK- 
■3-PYwrrTKFRT- 
-ELYVVEVKPL5- 
-TVYVNEVXPLN- 

. .pbllhEtK.ss. 



1-TC GTliL 
1-T GE TI-8-V ( 
--SP KQYYG--- 
-AD DWRMG-9 
--SR GARRT-4 
--SP G5RKI-4 . .. 
1-T GA. NL-S-V { 
1 -A TK KN -1 - E I 
5-SRVGSGAQ-3-H1 
--SP GGRKT-5-V1 
--AQ EK,N LA-2 -TNQMDDKWVEK 

--SK GNPAD GLQM DUWITG 

--ST GTRAI-4-YDOjGRPLYLEW- 
--GD GHRRG- 5 -YQQGHPAYVRA 

--GRYVPLAD ITAVGENLND- 

--S GL SM---G QM K WI A- 
3-V5. NPNGA-2 -NTQL SDWIAA- 
--S GK1 NLJ- 
--AQ GKTDD- 
--GYPEKTRN- 
4 Ql SM KG- 
5- -LSSNKNS- 3-G' 
T - NPPNS ■ 3 
■ -SpLS.p 



hhhhhhhhhhhh 

hhhhhhhhhhhh 

KEHBRKLLDFL5MFT- 

IVKKMAHT DPDLADAL ERAK LQKR - 

ILL MRER-10-EEDLADEIQAALENGK- 
TT MKPR-10-EERLAEMIEEARAAGK- 
6-EKQAAKEVIKAYKEGR- 
li DAKV IKE :'. ,il K K ■ 
--DESETELLAAXQKGN- 
5-EKAAANSINiAIQDGI- 
--6-ERLLADALEDALARGN- 

■NLENMGL SAADKDAII.-.NLG PG- 

■KIMELKKV DPKLAGELEKAQQNGR- 




--LDYVEVRALVDQ 
--LHYVEVKGNPIT- 
- -LFYAEVKGIT — 
— LWOVSRTKHVL 
--LQYVLVOyiTENT- 
--VQYLKVQLPIKD 
1-IRYVHTQAEIS5 
--LEYAWKGEPKS 
1-VKRIYAQTDAEG 




■K M -KVGE EEAAK I AMN G ■ 

AKTGKS R V — 6 - DR V EAG K FM D AFDAGR 
MLKNDKDF--1-KAAEEHGL LQRIKNRE- 

ILSMMAGR DERVAVLDSFA 

NLKRLRDQ-12-OVKQAREMVAAIDANR- 
K L D A LQKA - - 6 - YAT I K E MVN E DR YR AR 

VMRKLPD SDPTKIA/EQARIAGK- 

■ERLDSTLS--1-EKADEI VEEMLLNPEN - 

■RILDAVDG DEELAYKIMEALKKGQ- 

HLADGAV PNSHTEAMETANKNGT- 

RLPOEI TDKKILKDLKNEG- 

■RITALKES--4-SMKTATILQKARLDKK- 
5VQRLKDT--4-LIKTAEWEQAFRNGM 
Ippb S. .IDpSD.p.p. 



-6-IEYWLVRQDFNI 

V E KWLV RTAEDG - 

VEKWLVHTDPFG 

- 2 - VE YYL VTAS GGR 

TEQVWPTH PDY 

LEYELRLGH DTN 

IFRLKEIDGEL- 

L K TAVTGVD RAT 

VQRIV IQVM PDG - 

---VERVLSKVDADG- 
---LKRTVTHVDGDG- 
— YEAWLMIVDE5G 
-1-INKIWGVNDSR- 
L VKTVSGVN SN G ■ 

, . . Ipbhlsbsp. p. . 



■3-QURVLVPEKAE 
8-GYAYKPFNGYD 
4-QGQILDSRS-- 

4-QKALGEDP 

4-GTTVETLYGKL 

4- GAKLKHFRLF- 
8-AVQVREFHLK- 

5- KFIAKEFDIDI 
4-GYRYQRFDISK 
2-YHEIRGIDKAG 
8-AFVMKAFDEIG 1 
--DNSLKSIWA-- 

- - IT AIKIGADG 
--■*/ VGVLDK.NG 
--NVTVREFKK- - 
2-PAKYT-FRPVR 
2 - DTT SARSSVLA 
--KIEIAKVDSNG 
1-KAVIIPVQVPS 
--TIVK KILEV-- 
--NVVTYRLDSQG 
--NM- ISNYATKG 
1-VIQITKLDKSA 
--AITLN-LGQVG 



1406 
1008 
1174 




. . hph . . 



Figure 4. (A) Multiple sequence alignment of the EndoU family emphasizing the new bacterial versions found in this study. The eukaryotic EndoU 
domain (PDB: 2c lw) is shown to the right to indicate the spatial position of the conserved elements and the two units with three-strands each. 
(B) Multiple sequence alignment of the newly identified REase family. The structure of the archaeal Holliday junction resolvase (PDB: 10B8) is 
shown to the right to indicate the spatial location of the conserved residues in this fold. Secondary structure elements are indicated above the 
alignments ('e' in blue, p-sheet; 'h' in red, a-helix). The numbers in brackets represent excluded residues from sequences and 'hash 1 indicates the 
catalytic residues. 



Nuclease toxins of the HNH/ENDO VII fold. The HNH or 
the ENDOVII fold is a version of the treble-clef fold. The 
treble-clef fold is one of the most prevalent Zn-binding 
motifs across the three superkingdoms of life (56). 
Classical HNH nucleases, like the restriction endonuclease 
(REase) McrA and the T4 endonuclease VII, contain the 
four conserved, Zn-chelating cysteines of the treble-clef 
fold (52). However, these cysteines are lost in several 
forms, such as the REase MboII, colicin E8 and the 
NucA family, but these domains still retain the character- 
istic structural geometry of the treble-clef (52,56). The 
active site of these enzymes is formed at the interface of 
the characteristic helix and (3-hairpin and contains a 
divalent cation, which is chelated by three polar residues 
usually from the first strand of the P-hairpin and the 
C-terminal helix of the treble-clef fold. The residues 
chelating the metal are typically histidine, aspartate and 
asparagine but their exact configuration can greatly vary 
between different members of this fold making them dif- 
ficult targets for identification through sequence analysis 
(52). Among the nucleases of this fold occurring in the 
neighborhood of the SUKH superfamily we observed 
eight distinct families spanning the entire gamut ranging 
from conventional HNH nucleases to certain highly 
derived forms that have not be identified before. The con- 
ventional HNH versions (e.g.AMl_0143, gi: 158333371 
from the cyanobacterium A. marina) retain all the four 
cysteines of the treble-clef fold and a typical arrangement 
of residues chelating the catalytic metal. Others, like the 
nuclease domains of the PSPTO_3229 protein from 
P. syringae (gi: 28870395) and some CDI proteins, 
belong to the colicin E7/E8/E9 family (Figure 2). 
A highly derived version is represented by the NucA 
family (57), where structural analysis reveals that a 
treble-clef domain which has lost the characteristic 



cysteines is inserted between two copies of a 
three-stranded domain with distinct loop-like C-terminal 
extensions (Figure 3). We uncovered several divergent, 
earlier unrecognized NucA family nuclease domains in 
both the SUKH superfamily neighborhoods and CDI 
systems, such as those typified by the B. subtilis protein 
YeeF (gi: 251757354). The structural organization of the 
NucA domain suggests that it arose from an ancestral 
HNH/EndoVII domain, which 'carried' these duplicated 
three-stranded units along with it to form a more complex 
domain. Consistent with this proposal, we discovered 
a family of novel HNH fold nucleases in our 
gene-neighborhoods, which contain an active site similar 
to the NucA nucleases, but are standalone versions 
without the two flanking three-stranded units. We called 
this family GH-E after the three conserved residues 
associated with the active site typical of these domains. 
Interestingly, a subset of the GH-E family preserves the 
conserved cysteines of the treble-clef suggesting that they 
indeed represent the potential evolutionary intermediate 
from a classical HNH domain to the derived NucA-like 
forms (Figure 3). 

We also recovered three other novel families of 
domains, which are respectively typified by nearly abso- 
lutely conserved tripeptide sequence motifs LHH, WHH 
and AHH (Figure 3). Most CDI operons, which encode a 
SUKH domain immunity protein, have proximal toxin 
genes with a LHH domain as the polymorphic 
C-terminal unit of their products (Figure 2). 
Additionally, the LHH domain is found in products of 
genes adjacent to the SUKH superfamily gene outside 
of proteobacteria in several other bacterial lineages 
such as firmicutes, actinobacteria, bacteroidetes and 
planctomycetes (Figure 2). Although we also found the 
WHH domain as the polymorphic toxin unit of a subset 



4542 Nucleic Acids Research, 2011, Vol. 39, No. 11 



Firmicutes 





BvXG 


PT-TG 


^GH-E 


bcereOOOl 37880 Beer 
(229198256) 




WXG 1 


PT-TG 


AHH 


athur0012 20330 Blhu 
(22891 J 803 1 




WXG 


PT-TG 


HNH 


BAT 0847 Bpum 
(194016402) 




!WXG 


PT-TG 




BPUM 3407 Bpum 
(157694157) 




WXG < 




WHH 


pxol 98 Bart 
(10956345) 




WXG ( 


*.* 


HNH 


BALH 0252 Bthu 
(118476014) 




LXG 


A-link 


( AHH 


bcere0009 38810 Beer 
(229163112) 


1 


LXG 


A-link 


HNH 


BCG9342 B43S9 Beer 
(216695967) 


1 


LXG 


A-link 


{lhh 


(229152854) 




"lxg 4 


A-link 


EndoU 


bthurOOOS 8200 Bthu 
(228951399) 




LXG 


A-link 

LXG 


WHH 

REase 


bcereOOH 43B90 Beer 
(229152857) 

bcere0025 26870 Beer 
(229070503) 






LXG 


(WHH 


RBTH 01560 Bthu 
(75762458) 




LXG 


PT-TG 


[NSJHH 


bcere0017 55840 Beer 
(229119351) 


LDXD 


PT-TG 


MJC_ 


DH'JNK 


bthur0013 67390 Bthu 
(228912685) 


LDXD 


FT-TG 


NUC_ 


WHH 


(229145867) 




LDXD 


PT-TG 


HNH 


RBTH 00539 Bthu 
(75760387) 




LDXD 


PT-TG 




BT9727 3074 Bthu 
(49478146) 








WHH 


bcereOOH 30210 Beer 
(229151476) 




LDXD 


PT-TG 


( GH-E 


bthur0009 30440 Bthu 
(226934588) 






HINT 


WHH 


(154482993) 






PT-TG 


{LHH 


G11MC16DRAFT 2685 Gsp 
(196250235) 






PT-TG 


NucA 


BCAH820 2214 Beer 
(218903330) 






PT-TG 


WHH 


BcerSB 1965 Bcyl 
(152975723) 


(A 




A-link 


WHH 


bcereO02B 30080 Beer 


% 






(229030948) 


:as; 




A-IInk 


HNH 


bmycoOOOl 4290 Bmyc 
(229009960) 


se c 




A-link 


EndoU 


bcefe0002 34540 Beer 
(229191795) 


lea 






AHH 


bcere0025 58850 Beer 
(229073792) 


Nuc 






HNH 


bcere0007 28900 Beer 
(229167930) 






NUC_ 


NucA 
WHH 


BCG9842 B1945 Boer 
(216698359) 

bcere0020 29690 Beer 
(229097730) 






HINT 


^LHH 


EUBVEN 00689 Even 
(154482991) 



Proteobacteria 





<5- <S p«ar ™ {lhh 


r™ n« 


RMS RHS RHS RHS RHS RHS RHS HHS^LHH 




RHS RHS RHS RHS RHS RHS RHS RHS^AHH 


»AR ».» 


RHS RHS RHS RHS RHS RHS RHS RHs£f H 




RHS RHS RHS RHS RHS RHS RHS RHS [NSJHH 




J RHS RHS RHS RHS RHS RHS^H-E 







TpsA-SD DUF637N. DUF637 A-link LHH 
TpsA-SD DUF637t. DUF637 -f^ A-link LHH 
TpsA-SD FIIH FIIH FIIH FIIH FilH FIIH W^^^^^ 
H FilH FilH FilH FilH FilH FilH FilH FilH FilH FilH duJwn^INT E7/8/9 
TpsA-SD FilH FilH FilH FilH FilH FilH FilH FilH FilH FilH FilH DEAM 



Blr_1297_Btri 



is AHH 



i OI'ji:?OOOOa20_Sent 



NEIMUCOT_05529_r% 



' DUF 769 LHH 
JF6:S7 # LHH 



Actinobacteria and others 

GH-E 



ALF ALF ALF ALF 



WXG-likeWXG-li! 

PT-TG (WHH 

RHS RH-3 NUCA 

RHS RHS NUCA 

H1W1 WHH 
HINT DEAM 

HNH ^ *^ MinfTC 



N-terminal trafficking domains 



Central domains 



C-terminal toxin domains 



ATP-dependent YueA-like 
pumps of the HerA-FtsK 
superfamily 

(ldxdV 



O Sec-dependent system 



Two partner-system 




Potential activity 

DNasc 



7 M " 



IF218* -SUFU 



DNase/RNase 



Figure 5. Domain architectures of selected examples of nuclease toxins encoded in the neighborhood of the SUKH superfamily genes. A domain 
architecture network of these toxins is shown to illustrate the directionality and syntactical features of their organization. Arrows indicate the 
polarity of domain arrangement in a polypeptide with the arrowhead pointing to the C-terminus. Newly identified domains include DUF637-N, 
A-link (a-helical PT domain), WXG-like, LDXD, NUC_N, PT-TGE which are non-catalytic domains, and AHH, LHH, WHH, DHNNK, GH-E, 
EndoU, REase, [NSJHH, DEAM, which are toxin domains. The CdiAC domain is a predicted nucleic acid modifying domain that is present in the 
C-termini of CdiA proteins of Photorhabdus and E. coli. 



of proteobacterial CDI systems, none of these have a 
SUKH superfamily immunity protein. However, we 
found several non-CDI gene neighborhoods, which are 
likely to define distinct but analogous toxin systems, in 
proteobacteria, firmicutes, actinobacteria, synergistetes 
and bacteroidetes that combine genes for WHH and 
SUKH domain proteins (Figure 2). The AHH domain is 
also found in similarly organized gene-neighborhoods 
from the same bacterial lineages as those in which the 
WHH and LHH domains are found. Profile-profile com- 
parisons with multiple alignments of all these three novel 
domains indicated that the best matches are families of the 
HNH fold. Indeed, a visual examination of the conserva- 
tion patterns of these three domains showed that the HH 
dyad shared by them corresponds to the HH or DH 
dipeptide found in the first strand of treble-clef fold of 
the classical HNH domains (52). The first H forms one 



of the catalytic metal-chelating ligands and the second H 
contributes to the active site that directs the water for 
phosphoester hydrolysis (58). Further, the sequence align- 
ments of the LHH, WHH and AHH motifs revealed two 
further conserved histidines, which were associated with 
the helix of the treble-clef fold and aligned with the two 
C-terminal metal-chelating residues in the profile of the 
classical HNH domains (58,59). These observations 
indicated that the LHH, WHH and AHH domains are 
highly derived versions of the HNH fold. The eighth 
family of HNH fold enzymes emerging from this 
analysis comprises of proteins typified by the protein 
Dd586_1447 (gi: 271499995) from D. dadanti found in 
predicted toxins in SUKH neighborhoods and also in 
CDI operon products which do not contain a 
SUKH-type immunity protein (Figure 2). A subset of 
these domains constitutes the PFAM model for a 



Nucleic Acids Research, 2011, Vol. 39, No. 11 4543 



'domain of unknown function', DUF1994, that does not 
define the boundaries of this domain precisely. We were 
able to define the proper boundaries of this domain by 
using the diversity of distinct architectural contexts in 
which we detected it and used the refined alignment for 
profile-profile comparisons. This comparison revealed the 
representatives of HNH domains as the best hits and 
indicated a perfect match between the polar residues 
conserved in this domain and catalytic and active-site 
metal chelating residues of the classical HNH domains. 
We named this family of HNH domains as DH-NNK 
after the conserved DH dyad in the strand- 1 and the 
two asparagines and lysine which are conserved in the 
helix of the core treble-clef fold (Figure 3). While all 
these above versions have lost the cysteines of the ances- 
tral treble-clef, they nevertheless, retain the catalytic con- 
figuration typical of those nucleases. Hence, we predict 
that these domains are likely to be nucleases with a 
similar catalytic mechanism. Practically all characterized 
HNH fold nucleases, barring those of the NucA family, 
which show a distinct active metal chelating site (51), have 
a preference for DNA substrates. Hence, it is likely that 
most of these domains are the active components of toxins 
that hydrolyze DNA in the target cells. 

Nuclease toxins of the EndoU fold. The EndoU nuclease 
domain is typified by the nuclease domain previously 
identified in the U-specific, metal-dependent endonucle- 
ase, which in eukaryotes processes intron-encoded U16 
and U86 snoRNAs and generates products with 2—3' 
cyclic phosphate and 5'-OH termini (60). A related endo- 
nuclease was identified in nidoviruses, such as the severe 
acute respiratory syndrome coronavirus where it appears 
to process RNAs as a part of the replication complex 
(60,61). Our structural analysis revealed that the catalytic 
domain of these enzymes contains two elements each 
comprised of a single helix followed by a three-stranded 
unit. This suggests that it is likely to have emerged 
through duplication of the simple helix-three-strand struc- 
tural element, followed by flipping of the sheet in one of 
the units (Figure 4A). The catalytic residues, i.e. two his- 
tidines, appear to have emerged asymmetrically in 
a peculiar hairpin insertion within the helix of the first 
repeat. This hairpin insertion appears to be mobile 
and adopts different conformations in structures 
(60,61) — this mobility might have a role in accommo- 
dating the substrate between the helix and the sheets 
formed by the three-stranded units of the repeats (Figure 
4A). We found that the bacterial members of the EndoU 
family are linked to genes of the SUKH superfamily 
mainly in firmicutes and proteobacteria (Figure 2). 
Other than SUKH superfamily gene-neighborhoods, 
related versions also comprise the polymorphic 
C-terminal domain of the CDI toxins from Moraxella 
and Mannheimia that, however, lack a SUKH superfamily 
immunity gene. A further set of bacterial nucleases of this 
family are predicted secreted versions encoded by intracel- 
lular symbiotic and pathogenic bacteria, such as 
Wolbachia (gi: 310643370) and Ehrlichia (gi: 73666818). 
Most bacterial versions that we identified are extremely 
divergent relative to the eukaryotic and viral forms and 



are not recognized by the previously available HMM 
models for this nuclease (PF09412). Hence, the identifica- 
tion of these relationships represents a significant exten- 
sion of this superfamily (Figure 4A, Supplementary Data). 
Versions within these gene-neighborhoods show consider- 
able variability including loss of strands from the first unit. 
This variability suggests that the EndoU fold is rather 
flexible to accommodating drastic modification, which in 
turn might help it recognize a diverse spectrum of sub- 
strates. On the precedence of the eukaryotic EndoU and 
the nidoviral nuclease and their genomic organization we 
suggest that the majority of the bacterial EndoU 
homologs are nuclease toxins that cleave RNAs in the 
competitor cells. Those secreted by intracellular bacteria 
could be deployed as toxins or regulators to manipulate 
host physiology by cleaving specific transcripts. With the 
identification of these new EndoU homologs it becomes 
clear that the bacteria contain the greatest diversity of this 
superfamily, with certain versions closer to the eukaryotic 
and nidoviral versions and others that are more divergent 
(Supplementary Data). This suggests that the original ra- 
diation of this superfamily probably happened within the 
bacterial toxin systems and were subsequently acquired, 
perhaps from intracellular symbiotic bacteria, by eukary- 
otes and viruses. In the latter they appear to have been 
recruited as RNA processing enzymes. 

Nuclease toxins of the REase fold. The REase fold is a 
highly versatile fold that accommodates considerable 
structural diversity and has, not surprisingly, been used 
as the primary fold from which REases of 
restriction-modification systems are derived (48,52). We 
also found several proteins with this fold to be encoded 
by genes that are neighbors of SUKH superfamily genes 
(Figure 2). These versions were originally identified as a 
distinct conserved domain of unclear affinities — both 
PSI-BLAST and HMMER searches failed to identify 
any relationships with previously known domain. 
However, we observed that the multiple sequence align- 
ment of this domain showed a characteristic signature of 
conserved residues of the form GE-D-ExK-Q (Figure 4B) 
that matched the pattern of similar conserved residues in 
the lambda exonuclease and the RecB family of the REase 
fold (52,62). The predicted secondary structure pattern of 
these domains also closely matched the REase fold with 
conserved D and ExK motif falling on a (3-hairpin as is 
typical of the REase fold (Figure 4B). These observations 
induced us to use the alignment of this domain in a 
profile-profile comparison with the HHpred program, 
and we recovered a composite profile made of diverse 
REase fold superfamilies such as the VRR-Nuc, lambda 
exonuclease, the archaeal Holliday junction resolvase and 
RecB as the best hits (P = 10~ 5 ). This suggested that this 
family defines a novel group of REase-fold nucleases. 
Given that the majority of the REase-fold enzymes are 
DNases, we predict that these toxins are likely to cleave 
the DNA of the target cells. 

Nuclease toxins of the cytotoxic RNase fold. The last 
family of nucleases that we found encoded by genes 
linked to the SUKH superfamily genes was the cytotoxic 



4544 Nucleic Acids Research, 2011, Vol. 39, No. 11 



RNase family (63). This nuclease domain was first 
characterized as the toxin domain of the colicins E3 and 
E6 and is typified by a conserved active site configuration 
with an aspartate followed by a glutamate sandwiched 
between two histidines (Supplementary Data). The 
version of this domain in colicin E3 has been 
demonstrated to function as an EndoRNAse that specif- 
ically cleaves the phosphoester bond between bases 1493 
and 1494 of 16S ribosomal RNA (63). Given that versions 
detected in systems characterized in our current study are 
closely related to the version found in colicin E3 and E6, 
we posit that these nuclease domains act as RNAses that 
similarly cleave RNA in the target cells. 

Other domains with a possible role in nucleic acid 
modifications. We found three other families of domains 
in proteins that were encoded by genes which occupied 
positions adjacent to SUKH superfamily genes in certain 
predicted operons, equivalent to positions of the genes 
encoding the above nucleases. Additionally, these 
families of domains are also found as representatives of 
the polymorphic C-terminal module of the proteobacterial 
CDIs. Together these observations hinted that they are 
potentially uncharacterized enzymatic domains operating 
on nucleic acids. PSI-BLAST and JACKHMMER 
searches showed that the first of these families belonged 
to the nucleotide deaminase superfamily that includes 
RNA-editing enzymes, such as the APOBECs and 
DNA-modifying enzyme AID of vertebrates. Hence, like 
the nucleases, these enzymes are likely to function as 
toxins that mutate nucleic acids in the target cells. We 
discuss the natural history of these enzymes in a 
separate article (Iyer LM, Zhang D, Aravind L, manu- 
script in preparation). The second of these families 
prototyped by the B. cereus protein bcere0017_55840 (gi: 
229119351) is characterized by a conserved signature 
[NS]HH followed by another conserved histidine 
(Supplementary Data). Although we were unable to 
unify this family with any of the other nuclease folds, 
the presence of the HH motif typical of many of the 
above families of HNH/EndoVII fold nucleases might 
point to a divergent relationship with those proteins. 
The third of these families, typified by the CDI system 
from P. luminiscens (gi: 37524545) includes a globular 
domain of 170-200 amino acids that might define yet 
another uncharacterized nucleic acid-modifying domain 
(CdiAC in Figure 5, Supplementary Data). 

Identification of conserved domains with potential roles as 
trafficking components and auxiliary partner proteins of 
the SUKH superfamily-toxin systems 

Earlier characterized toxin systems such as the classical 
plasmid-encoded bacteriocins and the recently 
characterized CDI systems use thematically comparable, 
albeit biochemically distinct mechanisms for trafficking of 
nuclease toxins. While these systems have been used 
as models to understand bacterial protein trafficking, the 
complete set of events starting from the extrusion of the 
'pro-toxin' by the producing cell to its recognition at 
the target cell surface and delivery into the target cell 



are only partially understood (8). Classical plasmid-borne 
colicins and cognate bacteriocins from other bacteria do 
not have secretory mechanisms and their release appears 
to occur primarily through cell-lysis mediated by 
the colicin-release proteins (8). Colicin-like bacteriocins 
are multidomain proteins with an extreme C-terminal 
toxin module, which is either a nuclease or a membrane- 
perforating domain (e.g. colicin El and A) (8). They typ- 
ically possess two additional N-terminal modules, of 
which the first facilitates translocation across the target 
cell membrane and the second (i.e. the central module) 
facilitates binding to a membrane receptor on the target 
cell. These colicins hijack either the Tol or the Ton- 
dependent molecular import systems to enter the target 
cells (8). The chromosomally encoded proteobacterial 
CDI system toxins do not require lysis; instead they are 
trafficked out of the cell which produces them via the two- 
partner-system that depends on the CdiB proteins 
belonging to the TpsB class of outer-membrane trafficking 
proteins (25). These latter proteins contain N-terminal 
periplasmic polypeptide-transport-associated (POTRA) 
domains linked to a C-terminal (3-barrel transmembrane 
domain. They recognize the secretory domains such as the 
TpsA-SD in the extreme N-terminal region of the CDI 
'pro-toxins' to deliver them across the outer membrane 
of proteobacteria (64). This N-terminal region is separated 
from the C-terminal regions by repetitive regions with 
RHS- or filamentous hemagglutinin-type repeats. Their 
uptake by the target cell is less-clearly understood. In 
the well-studied examples, the first step of this process 
appears to depend on the outer membrane-biogenesis 
protein BamA recognizing a conserved a-helical domain 
immediately N-terminal to the toxin module, with a 
VENN signature that overlaps with the PFAM model 
termed 'DUF638'. Subsequently the inner-membrane 
protein AcrB, a transporter, appears to be necessary for 
uptake into the target cell cytoplasm (25). Additionally, it 
is posited that a proteolysis step at the cell surface releases 
just the C-terminal nuclease module for uptake by the 
target cell (25). Thus, despite the differences between the 
CDI and classical colicin-like systems they share a 
common feature of the toxin activity being borne by the 
extreme C-terminal domain in a multidomain polypeptide. 
Further, the modules located immediately-N-terminal to 
the nuclease domain (e.g. the a-helical domain with the 
VENN motif ~PFAM DUF638) are involved in associ- 
ation with receptors on the target cell. Hence, we term 
these domains collectively the pre-toxin (PT) domains. 
The extreme N-terminal domains appear to play a 
critical role in export from the host cell in the cases 
where lysis is not involved, i.e. typically chromosomally 
borne versions. These observations accordingly presented 
the organizational logic for these systems, wherein there 
are usually three functionally distinct sets of modules in 
the pro-toxin going from the N- to the C- terminus of the 
protein. 

Analysis of the domain architectures of the nuclease 
domain-containing proteins encoded in the SUKH- 
superfamily neighborhoods revealed that the majority of 
the proteins followed an architectural logic which was 
consistent with the above-described organization of these 



Nucleic Acids Research, 2011, Vol. 39, No. 11 4545 



earlier studied toxin systems (Figure 5). However, only a 
relatively small subset of the SUKH domain-associated 
systems overlaps with the CDI systems. Further the 
SUKH superfamily proteins and functionally linked 
toxins are also found outside of proteobacteria, in 
lineages lacking outer membranes and CdiB-like delivery 
systems. We reasoned that analysis of these distinct 
pre-nuclease and extreme N-terminal domains might 
reveal features pertaining to the trafficking of toxins in 
non-CDI systems and point to alternative delivery 
mechanisms. 

Identification of multiple distinct trafficking systems for 
toxins encoded in SUKH superfamily neighborhoods. We 
observed that in Gram-positive bacteria, proteins with 
the C-terminal nuclease typically possessed one of a set 
of several distinct domains at the extreme N-terminus of 
the protein (Figure 5). A significant subset of these could 
be unified using sequence profile searches with the 
PSI-BLAST and JACKHMMER programs to the 
WXG/ESAT6 superfamily of a-helical domains (65). 
These domains are a specific signal recognized by the 
YueA-like ATPases of the HerA-FtsK superfamily that 
secrete them in an ATP-dependent manner (65,66). This 
indicated that the WXG/ESAT-6 domain-containing 
toxins in Gram-positive bacteria are extruded by 
YueA-like pumps using an ATP-dependent mechanism. 
A significant subset of toxin proteins from firmicutes pos- 
sessed a distinctive N-terminal domain that could not be 
unified with any earlier known domain (a subset of these 
have been included in the erroneously annotated model 
Transposase_30 of PFAM; PF04740). Sequence searches 
showed that this domain possessed a conserved [LF]XG 
sequence motif and it was predicted to assume an a-helical 
bundle fold based on the multiple sequence alignment 
(Supplementary Data). We accordingly termed it the 
LXG domain (Figure 5) and were able to unify it with 
the WXG domain by means of profile-profile compari- 
sons with the HHpred program (P = 10 ). Contextual 
analysis indicated that this domain is encoded by certain 
conserved gene-neighborhood across firmicutes, where it is 
associated with genes coding for a YueA-like HerA-FtsK 
superfamily protein pump and a small protein related to 
the S. aureus EsaC protein (gi: 282917938, Supplementary 
Data). Through profile-profile comparisons we showed 
that the EsaC-like superfamily is a bacterial version of 
the eukaryotic EVH1 peptide-binding domains with the 
PH-like fold (HHpred P-value: 10" 4 ) (67). These observa- 
tions suggest that the LXG domain is comparable to the 
WXG/ESAT-6 domain, and is likely to utilize the 
ATP-dependent YueA pumps and the potential 
peptide-binding EsaC domain as partners for extrusion 
from the producing cell. The protein Srot_0310 
(gi: 296392744) from the actinobacterium S. rotundus 
contains two copies of a distinct domain N-terminal to 
the GH-E nuclease domain (Figure 5). This domain is 
also widely found in several actinobacteria at the 
N-termini of putative cell-surface proteins. Profile- 
profile comparisons suggested a possible relationship 
between these N-terminal domains and the WXG 
domain suggesting that it might be yet another 



representative of the WXG-like superfamily (P = 10~ 4 ) 
and might utilize a similar ATP-dependent mechanism 
for its extrusion. A fourth group of proteins, restricted 
to certain firmicutes (e.g. S. aureus SACOL0281 protein; 
gi: 57652555), is typified by yet another N-terminal 
a-helical domain (LDXD in Figure 5) that is also found 
in domain architectural contexts very similar to the WXG 
and LXG domains. It is conceivable that this domain 
is comparable to them and functions similarly as a 
mediator of export via the HerA-FtsK superfamily 
pumps. Thus, a notable mode of export of nuclease 
toxins in Gram-positive bacteria appears to be via the 
ATP-dependent extrusion system, which while biochem- 
ically distinct from the TPS of the proteobacteria, is 
thematically comparable. 

In Actinobacteria, but not firmicutes, we observed 
several large proteins with architectures similar to the 
CDIs of the proteobacteria. These typically contain 
RHS repeats; however, their extreme N-terminal 
domains did not bear any close relationship to the 
proteobacterial TpsA-SD. Instead they were found to 
contain an N-terminal signal peptide and some of these 
proteins (e.g. gi: 256812841, a protein from S. griseus) 
contain multiple lamininG domains embedded within 
repetitive regions. The protein DIP1652 (gi: 38234225) 
from C. diphtheria shows another distinct low complexity 
repeat N-terminal to the nuclease domain (Figure 5) and 
like in the above case it also possesses a conventional 
signal peptide. Likewise, a distinctive signal peptide, 
which is highly conserved in multiple proteins only 
within the genus Planctomyces, is seen in predicted 
nuclease toxins from this organism (e.g. gi: 149178028). 
Another group of large toxin proteins with RHS repeats, 
which predominantly occur in proteobacteria, are defined 
by the presence of repeats of the PAAR domain (PFAM: 
PF05488) N-terminal to the RHS repeats. All these 
proteins are typified by the presence of a conserved trans- 
membrane domain with two TM segments (Figure 5 and 
Supplementary Data) just N-terminal to the PAAR 
domains. We propose that these TM segments are 
required for their trafficking to the cell membrane, follow- 
ing which they might be processed in the periplasm for 
release via the outer membrane in a process that might 
depend on the PAAR domains. We also noticed a com- 
parable domain with two TM segments in few firmicutes 
(e.g. gi: 125974537 from C. thermocellum) and in 
chlamydiae (e.g. 189219187 from M. infernorum, which 
is a rare case of the nuclease domain occurring 
N-terminal to the two TM domain; Figure 5). These 
proteins lack PAAR domains but the firmicute versions 
have additional hedgehog-intein (HINT) peptidase 
domains (see below) that could aid in their release on 
the cell-surface (Figure 5). These observations suggest 
that at least some nuclease toxins in bacterial lineages 
such as actinobacteria, bacteroidetes and planctomycetes 
with conventional signal peptides, and those in 
proteobacteria, chlamydiae and firmicutes with two-TM 
domains are probably delivered to the cell using the con- 
ventional Sec-dependent system (68). In the context of the 
above cases, it is of interest to note that E. coli Syd, an 
archetypal member of the SUKH superfamily, was first 



4546 Nucleic Acids Research, 2011, Vol. 39, No. 11 



identified as a possible proof-reading component of the 
Sec-dependent export system (43^5). In this context it is 
possible that the binding of certain members of the SUKH 
superfamily (at least the Syd-like group) in the producing 
cell might not only help in conferring immunity to 'self 
but also in guiding the 'pro-toxin' to the Sec-dependent 
export machinery. 

Both actinomycetes and firmicutes do not display 
proteins with a PT domain with the VENN motif 
(PT-VENN). However, we observed that in both these 
lineages there was a conserved a-helical domain that fre- 
quently occurred just to the N-terminus of several distinct 
nuclease modules in different predicted toxins. This 
domain had a conserved TG motif and we accordingly 
named it the PT-TG domain (Figure 5, Supplementary 
Data). The PT-TG domain might play a role similar to 
the PT-VENN domain in Gram-positive bacteria and 
mediate interaction of the extruded toxin with cell-surface 
receptors on the target cells. The complementary distribu- 
tion of the PT-VENN and PT-TG domains in 
proteobacteria and Gram-positive bacteria suggests that 
they are distinct adaptations related to the drastically dif- 
ferent cell-surface morphologies of the respective groups. 
Another domain, which we found frequently associated 
with several unrelated or distantly related nuclease 
domains from Gram-positive bacteria, was the 
Nuclease_N domain (Figure 5, Supplementary Data). It 
is predicted to be an a-helical domain and might also play 
a role in the delivery of the toxin module into the host 
cells. Toxins in the SUKH superfamily neighborhoods, 
irrespective of the type of the nuclease domain, can also 
be distinguished into two major architectural groups: one 
comprised of relatively small proteins with no notable 
stretches of repetitive sequence separating the N- from 
the C-terminal regions, and the second in which such 
repetitive sequences, such as the RHS and the filamentous 
hemagglutinin are present (Figure 5). This might reflect a 
mechanistic difference in their mode of action: the smaller 
proteins could be soluble toxins that diffuse away from the 
cell producing it. In contrast, the large proteins with 
repetitive elements might form filamentous appendages 
that stick out from the cell-surface and depend primarily 
on contact with target cells for delivery [Hence, the latter 
group includes the recently characterized CDIs (25)]. 
Alternatively, this difference might reflect the differences 
in the cell-wall structures of the bacterial lineages, with the 
smaller toxin proteins being more prevalent in the 
firmicutes. A subset of the smaller proteins with nuclease 
domains lack noticeable trafficking-related (N-terminal) 
domains. The corresponding genes could represent cas- 
settes for alternative toxin modules that are linked by 
recombination to the larger full-length genes (Figure 5, 
see below). 

Other auxiliary domains which might play a role in 
resistance, trafficking or processing of toxins. Several 
other domain families were found to be encoded by 
genes having persistent association with the SUKH super- 
family neighborhoods across distantly related bacterial 
species. One of these is the SuFu superfamily (Figure 2 
and Supplementary Data) prototyped by the Suppressor 



of Fused protein from Drosophila (69). In addition, we 
also detected members of this superfamily to be encoded 
by CDI-like operons, such as the one from N. gonorrhoeae 
that encodes a toxin with a distinct version of the HNH 
fold nuclease domain (toxin NG01392, gi: 59801740; 
Supplementary Data). In these cases the SuFu superfamily 
gene occupies a position equivalent to that of the SUKH 
superfamily gene, suggesting that they might be function- 
ally comparable. We also found several examples wherein 
the SuFu and SUKH domains are combined in the same 
polypeptide (Figures 1 and 5). Based on these associations 
we propose that the SuFu domain represents a second 
widely conserved domain that function as an immunity 
protein for diverse nuclease toxins. Two other conserved 
protein families are encoded in the toxin neighborhoods 
(SUKH-neighborhood conserved family 1 and 2; SNCF1 
and SNCF2, Supplementary Data) that occupy positions 
similar to the SUKH and SuFu superfamily genes (Figure 
2). They were not found in multi-domain architectures 
typical of the nuclease toxins and always occurred as 
proteins with standalone domains. This suggested that 
they were unlikely to be novel toxins but act as alternative 
immunity proteins just like the SuFu and SUKH super- 
family proteins. The HINT domain, prototyped by the 
peptidase domains of the animal hedgehog proteins and 
protein-splicing inteins, is also frequently associated with 
SUKH superfamily neighborhoods (70-72). These 
versions of the HINT domain are closer to those found 
in several bacterial surface proteins and the secreted 
animal proteins such as hedgehog and the C. elegans 
Hog proteins (70). When present in a multidomain 
'pro-toxin' protein, the HINT domain always occurs sand- 
wiched between the PT domains such as PT-VENN and 
PT-TG and the nuclease toxin domain. This location of 
the HINT domain suggests that it is likely to serve as a 
peptidase that undergoes autoproteolytic cleavage, similar 
to what is observed in hedgehog and the inteins (70), to 
release the C-terminal nuclease domain for uptake by the 
target cell. It is conceivable that this cleavage step is 
regulated by the interaction of the PT domains with the 
surface receptor on the target cell. 

Eukaryotic/DNA viral members and structure-function 
analysis of the SUKH superfamily 

While SUKH superfamily neighborhoods are very wide- 
spread in bacteria, they are largely absent in archaea. 
Although we uncovered potential extruded nuclease 
toxins in certain halophilic archaea such as H. borinquense 
(gi: 312291883, with a GH-E nuclease domain), which are 
delivered by means of a distinctive N-terminal 
metallopeptidase domain, we did not find any immunity 
proteins of the SUKH or SuFu superfamilies. Although 
the exact reason for this exclusion is unclear, it is conceiv- 
able that these immunity proteins are ineffective in the 
context of the distinct archaeal secretory systems. 
However, several eukaryotes possess one or more SUKH 
superfamily members. Phylogenetic analysis and phyletic 
patterns suggest that there are two major eukaryotic 
lineages of the SUKH superfamily that are nested within 
the radiation of the bacterial versions (Supplementary 



Nucleic Acids Research, 2011, Vol. 39, No. 11 4547 



Data). They are respectively prototyped by the 
polyglutamylase subunit PGs2 (22), and the vertebrate 
SCF ubiquitin E3 ligase subunit FBX03 with yeast 
Smil/Knr4 (21,73). The PGs2 version is found in basal 
eukaryotes such as Giardia and Spironucleus, animals 
and chlorophyte algae suggesting that it was likely to 
have been acquired prior to the last eukaryotic common 
ancestor (LECA) and subsequently lost in several lineages. 
The FBX03 lineage is present in animals, fungi, plants, 
stramenopiles and ciliates. However, it does not group 
with the PGs2 lineage, instead grouping with other bac- 
terial forms. Hence, it was probably acquired relatively 
early in eukaryotic evolution via an independent transfer 
from bacteria. In both plants and animals the FBX03 
version is fused to an N-terminal F-box domain and a 
distinctive C-terminal immunoglobulin superfamily 
domain (overlaps with the PFAM model DUF525), sug- 
gesting that it was recruited as an E3 subunit prior to the 
radiation of these eukaryotic groups. In addition to these 
versions, there appear to have been other sporadic trans- 
fers of SUKH superfamily members to eukaryotes. For 
example, land plants contain a version typified by the 
Arabidopsis protein At3g50340 (gi: 15229727) which 
seems to have been independently acquired by them 
from a bacterial source. Another sporadic transfer is 
seen in certain filamentous fungi, which acquired a 
version of the SUKH-4 group that has been independently 
fused to an N-terminal F-box domain (e.g. A. oryzae gi: 
169782758). DNA viral versions show no specific relation- 
ship with eukaryotic forms; instead, they share specific 
sequence motifs with the SUKH-3 group, recover them 
as best hits in profile-profile comparisons, and group 
with them in the phylogenetic tree (Supplementary 
Data). Within viruses they are most widespread and 
abundant in herpesviruses, with the versions from 
adenoviruses, poxviruses and iridoviruses being nested 
within the herpesviral radiation of the family 
(Supplementary Data). Thus, they appear to have been 
acquired first by an ancestral herpesvirus, similar to that 
inserted in the amphioxus genome (46), from a bacterial 
source and subsequently disseminated across diverse DNA 
viruses. 

Although there has been gene loss in several eukaryotic 
lineages, at least the two ancient versions, namely PGs2 
and FBX03 appear to have been largely vertically 
inherited and show no lineage-specific expansions within 
eukaryotes. This is in sharp contrast to the high propen- 
sity for lateral transfer and for lineage-specific expansions 
of the SUKH superfamily that is observed in bacteria. 
This feature, together with the available functional 
evidence suggests that these conserved eukaryotic 
versions have acquired a biological role distinct from 
that in the toxin-immunity systems of bacteria. 
Nevertheless, there were several features that suggested 
to us that biochemically the eukaryotic versions might 
be exploiting an ancient functional template provided by 
the SUKH domains in bacterial nuclease toxin systems. 
Firstly, the studies on yeast Smil/Knr4 have shown that it 
interacts with a large number of structurally and function- 
ally distinct proteins (19). In FBX03, and independently 
in the above-mentioned fungal proteins, it appears in a 



domain architectural context corresponding to the part 
of the E3 F-box subunit that recognizes the substrate for 
ubiquitination (74). This suggests that it might be 
deployed as a recognition domain to recruit particular 
substrates for ubiquitination. In bacteria the SUKH 
superfamily domains are one of the most widespread 
immunity proteins that appear to function in conjunction 
with a repertoire of nuclease toxins that are extremely 
diverse in sequence and structure (Figures 3 and 4). 
Taken as a whole, these observations indicate that the 
SUKH domain contains a scaffold that has been 
adapted to recognize a diverse set of protein partners. 

A possible clue for the structural basis of this capability 
is offered by studies on the E. coli Syd protein: it has been 
shown to contain a prominent negatively charged cleft 
with which it could interact with partner proteins (45). 
Examination of the structure of this protein indicates 
that this cleft is formed by the space between the 
conserved helix H3 and the fissure in sheet between the 
two-stranded N-terminal unit and the C-terminal 
4-stranded meander (Figure 1). Given that this unusual 
feature is seen across the fold, we examined the surface 
renderings of different SUKH superfamily members and a 
corresponding cleft is observed in most of them (Figure 1). 
Although this cleft is not necessarily negatively charged as 
in Syd, and might vary in depth and shape, its widespread 
presence suggests that it might be the means by which the 
SUKH superfamily is able to accommodate different 
protein partners. In support of this hypothesis we 
observed that in the case of two distantly related 
members of the SUKH superfamily, namely Syd (PDB: 
3ffv) and YobK (PDB: 2prv), this cleft is used in 
protein-protein interactions. In both these crystal struc- 
tures one of the monomers is bound in the cleft of the 
other monomer resulting in an asymmetric dimer 
(Supplementary Data). These dimers are unlikely to rep- 
resent biologically native dimeric states, but in any case 
illustrate the ability of the conserved cleft of the SUKH 
fold to accommodate other proteins. Interestingly, the 
SuFu superfamily also shows a comparable kind of sheet 
with a fissure between two sets of strands (69). 
Experimental studies on the Drosophila SuFu shows that 
it also functions as protein tether which holds the 
Zn-finger transcription factor Gli in the cytoplasm in the 
absence of the hedgehog signal (75). In vertebrates 
the SuFu ortholog has been shown to bind Gli2 and 
Gli3 and prevent their degradation due to ubiquitination 
by F-box E3 ligases (76). Thus, the presence of compar- 
able binding interfaces that have the flexibility to recog- 
nize a wide range of protein ligands might be a common 
feature shared by both the SUKH and the SuFu 
superfamilies of immunity proteins. It is this feature that 
appears to have resulted in them being utilized as adaptors 
for recruiting other proteins in eukaryotic regulatory 
systems. 

The extensive spread of the US22 group of the SUKH 
superfamily across unrelated or distantly related DNA 
viruses of animals suggests that it confers an important 
advantage to these viruses. This is also supported by the 
lineage-specific expansion in betaherpesviruses of the 
SUKH superfamily in the form of multigene arrays 



4548 Nucleic Acids Research, 2011, Vol. 39, No. 11 



similar to what is seen in bacteria (Figure 2). Indeed 
multiple studies suggest that distinct copies of the 
proteins in herpesviruses are required for effective 
survival and replication of the virus in their hosts. For 
instance, mutagenesis of two SUKH superfamily 
paralogs Ml 42 and M143 in the murine cytomegalovirus 
was shown to be essential for survival of the virus itself, 
whereas mutagenesis of other paralogs Ml 39, Ml 40 and 
M141 specifically prevents its replication in macrophages 
(77). Other studies indicated that M142 and M143 form a 
heterotetrameric complex which counters the action of the 
host protein kinase R (PKR) in shutting down viral 
protein synthesis (78-81). The human cytomegalovirus 
SUKH superfamily proteins TRS1 and IRS1 have been 
shown to similarly counter the PKR and the dsRNA 
dependent arm of the anti-viral response (78,82-86). 
Another paralog UL38 inhibits the host cell stress 
responses by antagonizing the tuberous sclerosis protein 
complex in the endoplasmic reticulum (87,88) and 
counters apoptosis in conjunction with yet another 
paralog UL36 (89,90). In light of these observations it 
appears that the viral versions of the SUKH superfamily 
are deployed to counter different facets of the host 
anti-viral and stress response. By analogy to the bacterial 
versions, which function as immunity proteins, we 
propose that the viral SUKH domain proteins in general 
bind diverse host proteins that are used against the virus. 
Here again the special ability of the SUKH scaffold to 
bind diverse proteins appears to have been exploited by 
the virus as a flexible binding interface to neutralize 
a diverse group of host anti-viral defenses. 

Evolutionary implications and general considerations 

Identification of the SUKH superfamily and associated 
nucleic acid modifying toxin systems has considerable im- 
plications for understanding bacterial genetic conflicts, 
evolutionary forces acting on strongly linked multi-gene 
loci, and potential biotechnological applications. We 
briefly discuss some of these implications that emerge 
directly from our observations. 

Relationship of toxin systems to genetic conflicts in the 
bacterial world. Classical colicins and earlier characterized 
CDIs act primarily on related bacterial strains of the same 
'species'. Although the systems identified in our studies are 
abundantly represented in extracellular pathogenic 
bacteria, they are rare in intracellular symbionts or patho- 
gens. This might be because intracellular bacteria are 
much less likely to encounter a heavy load of competing 
cells in the same niche. The bacterial toxin systems which 
we uncovered in this study and the related CDIs are also 
different in certain features from the classical colicin-like 
systems. Classical colicins are in large part encoded on 
plasmids, which might be either single copy, medium-sized 
conjugative plasmids or small multi-copy small plasmids 
that depend on the conjugative plasmids for their trans- 
mission (8). Such bacteriocins are relatively rare on 
chromosomes. In contrast, 99.25% of the systems 
recovered in our study are chromosomally encoded. 
Majority of the plasmid-encoded classical colicin-like 



toxins are accompanied by a gene encoding a lysis 
protein and their release is concomitant with the lysis of 
the host cell. However, none of the systems identified in 
this study or the CDIs have lysis genes in their neighbor- 
hoods (25). This difference suggests that, while both the 
plasmid-borne bacteriocins and these systems might be 
directed at close relatives, they appear to be geared 
toward distinct genetic conflicts. The lysis of the cell 
nullifies the fitness of the chromosome; hence, it would 
be largely deleterious for the chromosome to encode 
systems that require lysis. The plasmid being a selfish 
element is not completely affected by loss of fitness of 
the host as long as it can offset it by holding on to, or 
spreading in the host population (i.e. the plasmid's own 
fitness is enhanced or maintained). Cells of the host type 
without the bacteriocinogenic plasmid are competitors 
that affect the plasmid fitness, especially under stationary 
phase or starvation conditions. Hence, the plasmid-borne 
colicin would be primarily selected to act against host cells 
that have lost the plasmid or lack it by default under these 
stress conditions. Further, the plasmid toxins are unlikely 
to have ready access to trafficking by the host because, 
given the large amounts in which the colicins are produced 
(8), their export is likely to impair host fitness. Further, it 
has been shown that under starvation only ~3% of the 
cells produce colicin (91). Although the loss of the cells 
producing the colicin would endanger the resident 
plasmid, a relatively small fraction of the host population 
is affected. By the principle of inclusive fitness of kin (92), 
the plasmid could still have an enhancement of fitness 
from the copies in the surviving cell along with the elim- 
ination of competitors by the released toxin. On the other 
hand, the toxin domains of many of the chromosomal 
versions like the CDIs and those identified in this study 
appear to be borne on filamentous structures that are pri- 
marily geared toward to elimination of competitors that 
come in physical contact with the cell-surface (25,93). 
Therefore, these systems are likely to be critical in the 
context of the formation and organization of biofilms 
and solid substrate colonies. When bacterial cells are 
aggregating in the above contexts it would benefit to elim- 
inate resource sharing with non-kin competitors. Hence, 
presence of a chromosomally encoded toxin that acts at a 
short range is likely to be selected, resulting in the prolif- 
eration of systems such as those described here. 
Nevertheless, it would also benefit 'cheater cells' to 
evade such defensive mechanisms. Hence, they would be 
selected to maintain a wide diversity of immunity proteins 
to counter different non-self toxins, which might explain 
the arrays of diverse SUKH genes in several bacterial 
genomes. 

Potential evolutionary processes in diversification of toxins 
and immunity proteins. Imprints of the evolutionary arms 
race arising from the above processes are readily observed 
in our systems. The toxin proteins appear to show a rather 
peculiar pattern of diversification. The N-termini, which 
are typically associated with trafficking, tend to be rela- 
tively conserved while C-terminal nuclease domains show 
major diversity (Figure 5). This is consistent with a recent 
study on the diversification of RHS proteins in 



Nucleic Acids Research, 2011, Vol. 39, No. 11 4549 



enterobacteria which showed that the RHS proteins 
undergo C-terminal polymorphism due to rampant re- 
combination with invading cassettes that encode alterna- 
tive C-terminal modules (94). This type of recombination 
or gene-conversion with polymorphic C-terminal cassettes 
might explain the presence of smaller loci found in the 
gene-neighborhoods characterized here that encode just 
a nuclease domain by itself or with an additional small 
N-terminal extension (Figures 2 and 5). Hence, we 
extend the original proposal for RHS diversification to 
suggest that, more generally, recombination with cassettes 
with distinct C-terminal modules is the primary proximal 
mechanism for diversification of the toxin proteins across 
all bacterial lineages (Figure 5). Furthermore, the presence 
of nuclease and nucleic acid deaminase domains as the 
primary toxin modules of these systems raises the possi- 
bility that their nucleic acid cleaving or mutating activity is 
involved in triggering recombination events. This appears 
plausible given the observations that most of these nucle- 
ases are likely to be endonucleases, which like their coun- 
terparts in the restriction-modification systems could 
cleave at specific sequences. Similarly, deaminase-induced 
mutations have been implicated in the triggering of 
class-switching recombination events in vertebrates (95). 
More generally, this ties in with earlier studies which have 
demonstrated the role for both recombination and 
positive selection in the evolution of plasmid-borne bac- 
teriocins (96). It has been proposed that pore-forming 
versions have predominantly utilized recombination for 
diversification whereas nucleases have mainly evolved 
through positive selection. In our systems, the evidence 
points to both these forces being active at different levels 
in the evolution of the toxin proteins (96). While the basic 
architectures evolve through recombination generating 
C-terminal polymorphism, the C-terminal nucleases them- 
selves show evidence for considerable sequence diversifi- 
cation within each family. Indeed, much of the 
diversification of the HNH/EndoVII fold appears to 
have happened within the context of these systems, with 
several structurally distinct forms evolving amidst the 
nuclease toxins (Figure 3). 

Phyletic and phylogenetic analysis of the SUKH super- 
family indicates three salient features, namely rampant 
lateral transfer between different branches of the bacterial 
tree, gene loss and lineage-specific expansion followed by 
divergence of the lineage-specific paralogs (Supplementary 
Data). This suggests that there is a notable trend for main- 
taining diversity within the SUKH superfamily that 
probably arises from selection for recognition of a 
diverse range of nucleic acid-modifying toxins. Although 
there are multiple distinct types of immunity proteins 
known from plasmid-borne bacteriocins and CDI 
systems, most show very limited phyletic patterns. For 
example the Cdil toxin seen in several CDI systems is 
entirely limited to proteobacteria (25). We observed that 
it is a protein with two TM segments that is likely to form 
a membrane channel (Supplementary Data) and have a 
mode of action very distinct from the SUKH superfamily. 
As only the SUKH superfamily and, to certain extent, the 
SuFu superfamily show a pattern of wide dissemination 
across bacteria it is likely that only these scaffolds can 



support sufficient diversification that goes hand in hand 
with the polymorphism of the toxin domains. 

Implications for eukaryotic and viral functions. Our obser- 
vations also suggest that the biochemical diversity 
generated within these bacterial toxin systems has been 
taken up and utilized for very different functions by eu- 
karyotes and their viruses. Both the SUKH and the SuFu 
superfamily domains have been utilized as adaptors that 
regulate recognition of different substrates by protein 
modification systems such as ubiquitination and 
polyglutamylation. In a completely different context, the 
HINT domains derived from such bacterial toxin systems 
appear to have been used to release peptide messengers in 
animal signaling pathways, like the hedgehog pathway 
(70). The nuclease domains ultimately derived from 
various toxins also appear to have been used for different 
functions by eukaryotes and their viruses. The EndoU 
nuclease domain, which ultimately emerged from these 
toxin systems, has been recruited by the nidoviruses for 
the replication of their negative-strand RNA genome, 
whereas a related domain was recruited by eukaryotes 
for processing of certain snRNAs. We also observed 
that a HNH/EndoVII fold nuclease found in the bacterial 
toxin typified by the N. gonorrhoea protein N GO 1392 
is found in several eukaryotic lineages such as ani- 
mals, plants, stramenopiles and apicomplexans 
(Supplementary Data). Given its conservation and rela- 
tively lower divergence, it is unlikely that the nuclease 
functions as a toxin in eukaryotes. However, it is 
possible that it has been recruited as a DNA-repair 
enzyme, as has been previously observed in the case of 
certain nucleases of bacterial restriction-modification and 
phage replication systems (97). In general terms, these 
observations suggest that the origin of key systems in eu- 
karyotes, including those related to the emergence of 
certain lineages, such as animals (i.e. the hedgehog 
pathway), appear to have extensively benefited from the 
availability of 4 pre-adaptations' in the form of compo- 
nents whose ultimate origins lay in these toxin systems. 

CONCLUDING REMARKS 

The current study points to the remarkable flexibility of 
SUKH domains in mediating different protein-protein 
interactions. In a sense, this situation resembles what 
has earlier been observed with certain scaffolds like the 
immunoglobulin domain and the leucine-rich repeats of 
various immunity-related proteins of eukaryotes (98,99). 
The ability of the SUKH scaffold to accommodate diverse 
binding partners makes it a potential candidate as a 
template for protein engineering to generate novel 
binding capabilities. Likewise, the C-terminal diversifica- 
tion of the toxin domain could also have biotechnological 
utility as a model for generating secreted proteins that 
differ extensively in a given module but retain a constant 
N-terminal part. We hope that this characterization of the 
SUKH superfamily and identification of the associated 
nuclease toxin families provides new leads for the future 
exploration of the manifold implications of the systems 
discussed here. 



4550 Nucleic Acids Research, 2011, Vol. 39, No. 11 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
FUNDING 

National Institutes of Health Postdoctoral Visiting 
Fellowship; intramural funds of the National Library of 
Medicine at the National Institutes of Health, USA. 
Funding for open access charge: Intramural funds of the 
National Institutes of Health, USA. 

Conflict of interest statement. None declared. 



REFERENCES 

l.Stirpe,F., Barbieri.L., Battelli,M.G., Soria.M. and Lappi,D.A. 
(1992) Ribosome-inactivating proteins from plants: present status 
and future prospects. Biotechnology, 10, 405—412. 

2. Endo.Y. and Tsurugi,K. (1986) Mechanism of action of ricin and 
related toxic lectins on eukaryotic ribosomes. Nucleic Acids Svmp. 
Ser., 187-190. 

3. Endo.Y., Huber.P.W. and WoolJ.G. (1983) The ribonuclease 
activity of the cytotoxin alpha-sarcin. The characteristics of the 
enzymatic activity of alpha-sarcin with ribosomes and ribonucleic 
acids as substrates. /. Biol. Chem., 258, 2662-2667. 

4. Dhananjaya,B.L. and D'souza,C.J. (2010) An overview on 
nucleases (DNase, RNase, and phosphodiesterase) in snake 
venoms. Biochemistry, 75, 1-6. 

5. Rosenberg, H.F. (2008) RNase a ribonucleases and host defense: 
an evolving story. /. Leukoc. Biol, 83, 1079-1087. 

6. AloufJ.E. and Popoff.M.R. (2006) The comprehensive sourcebook 
of bacterial protein toxins, 3rd edn. Elsevier/Academic Press, 
Amsterdam/Boston. 

7. Riley,M.A. (1998) Molecular mechanisms of bacteriocin evolution. 
Aram. Rev. Genet., 32, 255-278. 

8. Cascales.E., Buchanan,S.K., Duche.D., Kleanthous,C, 
Lloubes,R., Postle,K., Riley.M., Slatin,S. and Cavard,D. (2007) 
Colicin biology. Microbiol. Mol. Biol. Rev., 71, 158-229. 

9. Anantharaman,V. and Aravind.L. (2003) New connections in the 
prokaryotic toxin-antitoxin network: relationship with the 
eukaryotic nonsense-mediated RNA decay system. Genome Biol., 
4, R81. 

10. Kobayashi.I. (2001) Behavior of restriction-modification systems 
as selfish mobile elements and their impact on genome evolution. 
Nucleic Acids Res., 29, 3742-3756. 

11. Engelberg-Kulka,H. and Glaser.G. (1999) Addiction modules and 
programmed cell death and antideath in bacterial cultures. Annu. 
Rev. Microbiol., 53, 43-70. 

12. Jensen.R.B. and Gerdes,K. (1995) Programmed cell death in 
bacteria: proteic plasmid stabilization systems. Mol. Microbiol., 
17, 205-210. 

13. Burroughs,A.M., Iyer,L.M. and Aravind,L. (2009) Natural history 
of the El-like superfamily: implication for adenylation, sulfur 
transfer, and ubiquitin conjugation. Proteins, 75, 895-910. 

14. Iyer,L.M., Burroughs,A.M. and Aravind.L. (2006) The 
prokaryotic antecedents of the ubiquitin-signaling system and the 
early evolution of ubiquitin-like beta-grasp domains. 

Genome Biol., 7, R60. 

15. Kawano.M., Aravind,L. and Storz,G. (2007) An antisense RNA 
controls synthesis of an SOS-induced toxin evolved from an 
antitoxin. Mol. Microbiol., 64, 738-754. 

16. Yamamoto,T., Hiratani,T., Hirata.H., Imai,M. and Yamaguchi,H. 
(1986) Killer toxin from Hansenula mrakii selectively inhibits cell 
wall synthesis in a sensitive yeast. FEBS Lett., 197, 50-54. 

17. Hong,Z., Mann,P., Brown, N.H., Tran.L.E., Shaw,K.J., Hare,R.S. 
and DiDomenico,B. (1994) Cloning and characterization of 
KNR4, a yeast gene involved in (l,3)-beta-glucan synthesis. 
Mol. Cell. Biol, 14, 1017-1025. 

18. Dagkessamanskaia,A., Martin- Yken,H., Basmaji,F., Briza.P. and 
FrancoisJ. (2001) Interaction of Knr4 protein, a protein involved 



in cell wall synthesis, with tyrosine tRNA synthetase encoded by 
TYS1 in Saccharomyces cerevisiae. FEMS Microbiol. Lett., 200, 
53-58. 

19. Basmaji.F., Martin-Yken,H., Durand.F., Dagkessamanskaia,A., 
Pichereaux.C, Rossignol,M. and FrancoisJ. (2006) The 
'interactome' of the Knr4/Smil, a protein implicated in 
coordinating cell wall synthesis with bud emergence in 
Saccharomyces cerevisiae. Mol. Genet. Genomics, 275, 217-230. 

20. Dagkessamanskaia.A., Durand,F., Uversky.V.N., Binda.M., 
Lopez,F., El Azzouzi.K., FrancoisJ. M. and Martin- Yken,H. 
(2010) Functional dissection of an intrinsically disordered protein: 
understanding the roles of different domains of Knr4 protein in 
protein-protein interactions. Protein Sci., 19, 1376-1385. 

21.Shima,Y., Shima.T., Chiba,T., Irimura,T., Pandolfi,P.P. and 
KitabayashiJ. (2008) PML activates transcription by protecting 
HIPK2 and p300 from SCFFbx3-mediated degradation. 
Mol. Cell. Biol, 28, 7126-7138. 

22. Janke.C, Rogowski,K., Wloga.D., Regnard.C, Kajava.A.V., 
StrubJ.M., Temurak,N., van DijkJ., Boucher,D., van 
Dorsselaer,A. et al. (2005) Tubulin polyglutamylase enzymes are 
members of the TTL domain protein family. Science, 308, 
1758-1762. 

23. Iyer,L.M., Abhiman,S., Maxwell Burroughs.A. and Aravind,L. 
(2009) Amidoligases with ATP-grasp, glutamine synthetase-like 
and acetyltransferase-like domains: synthesis of novel metabolites 
and peptide modifications of proteins. Mol. Biosvst., 5, 
1636-1660. 

24. Aoki.S.K., Pamma,R., Hernday,A.D., BickhamJ.E., Braaten,B.A. 
and Low,D.A. (2005) Contact-dependent inhibition of growth in 
Escherichia coli. Science, 309, 1245-1248. 

25. Aoki.S.K., Diner,E.J., de Roodenbeke,C.T., Burgess,B.R., 
Poole.SJ., Braaten.B.A., Jones,A.M., WebbJ.S., Hayes,C.S., 
Cotter.P.A. et al. (2010) A widespread family of polymorphic 
contact-dependent toxin delivery systems in bacteria. Nature, 468, 
439-442. 

26. Altschul.S.F., Madden,T.L., Schaffer,A.A., ZhangJ., Zhang,Z., 
Miller.W. and Lipman,D.J. (1997) Gapped BLAST and 
PSI-BLAST: a new generation of protein database search 
programs. Nucleic Acids Res., 25, 3389-3402. 

27. SodingJ., Biegert.A. and Lupas,A.N. (2005) The 
HHpred interactive server for protein homology 
detection and structure prediction. Nucleic Acids Res., 33, 
W244-W248. 

28. Holm.L., Kaariainen.S., Rosenstrom,P. and Schenkel.A. (2008) 
Searching protein structure databases with DaliLite v. 3. 
Bioinformatics, 24, 2780-2781. 

29. Edgar.R.C. (2004) MUSCLE: multiple sequence alignment with 
high accuracy and high throughput. Nucleic Acids Res., 32, 
1792-1797. 

30. PeiJ. and Grishin.N.V. (2007) PROMALS: towards accurate 
multiple sequence alignments of distantly related proteins. 
Bioinformatics, 23, 802-808. 

31. Lassmann,T. and Sonnhammer.E.L. (2005) Kalign — an accurate 
and fast multiple sequence alignment algorithm. 

BMC Bioinformatics, 6, 298. 

32. PeiJ., Sadreyev,R. and Grishin.N.V. (2003) PCMA: fast and 
accurate multiple sequence alignment based on profile consistency. 
Bioinformatics, 19, 427^128. 

33. Goodstadt.L. and Ponting.C.P. (2001) CHROMA: 
consensus-based colouring of multiple alignments for publication. 
Bioinformatics, 17, 845-846. 

34. Cuff J.A., Clamp,M.E., Siddiqui.A.S., Finlay,M. and Barton,G.J. 
(1998) JPred: a consensus secondary structure prediction server. 
Bioinformatics, 14, 892-893. 

35. Jones,D.T. (1999) Protein secondary structure prediction based on 
position-specific scoring matrices. /. Mol. Biol., 292, 195-202. 

36. Finn,R.D., MistryJ., TateJ., Coggill.P., Heger,A., PollingtonJ.E., 
Gavin,O.L., Gunasekaran.P., Ceric,G, Forslund,K. et al. (2009) 
The Pfam protein families database. Nucleic Acids Res., 38, 
D211-D222. 

37. Krogh.A., Larsson,B., von Heijne,G. and Sonnhammer.E.L. 
(2001) Predicting transmembrane protein topology with a hidden 
Markov model: application to complete genomes. J. Mol. Biol., 
305, 567-580. 



Nucleic Acids Research, 2011, Vol. 39, No. 11 4551 



38. Kall,L., Krogh,A. and Sonnhammer,E.L. (2005) An HMM 
posterior decoder for sequence feature prediction that includes 
homology information. Bioinformatics, 21(Suppl. 1), i251-i257. 

39. Price,M.N., Dehal,P.S. and Arkin,A.P. (2009) FastTree: 
computing large minimum evolution trees with profiles instead of 
a distance matrix. Mol. Biol. Evol, 26, 1641-1650. 

40. Sali,A. and Blundell,T.L. (1993) Comparative protein modelling 
by satisfaction of spatial restraints. /. Mol. Biol., 234, 779-815. 

41. Humphrey ,W., Dalke,A. and Schulten.K. (1996) VMD: visual 
molecular dynamics. /. Mol. Graphics, 14, 33-38. 

42. WoottonJ.C. and Federhen,S. (1996) Analysis of compositionally 
biased regions in sequence databases. Methods Enzymol., 266, 
554-571. 

43. Shimoike,T., Taura,T., Kihara,A., Yoshihisa,T., Akiyama,Y., 
Cannon,K. and Ito,K. (1995) Product of a new gene, syd, 
functionally interacts with SecY when overproduced in 
Escherichia coli. J. Mol. Biol., 270, 5519-5526. 

44. Matsuo,E., Mori,H., Shimoike,T. and Ito,K. (1998) Syd, a 
SecY-interacting protein, excludes SecA from the SecYE complex 
with an altered SecY24 subunit. /. Mol. Biol., 273, 18835-18840. 

45. Dalal,K., Nguyen,N., Alami,M., Tan,J., Moraes,T.F., Lee,W.C, 
Maurus,R., Sligar,S.S., Brayer,G.D. and Duong,F. (2009) 
Structure, binding, and activity of Syd, a SecY-interacting 
protein. /. Mol. Biol, 284, 7897-7902. 

46. de Souza.R.F., Iyer,L.M. and Aravind,L. (2010) Diversity and 
evolution of chromatin proteins encoded by DNA viruses. 
Biochim. Biophys. Acta, 1799, 302-318. 

47. Ye,Y., Osterman,A., Overbeek.R. and Godzik,A. (2005) 
Automatic detection of subsystem/pathway variants in genome 
analysis. Bioinformatics, 2 1 (Suppl. 1), i478— i486. 

48. Roberts,R.J., Vincze,T., PosfaiJ. and Macelis,D. (2010) 
REBASE — a database for DNA restriction and modification: 
enzymes, genes and genomes. Nucleic Acids Res., 38, D234-D236. 

49. Osbourn,A.E. and Field,B. (2009) Operons. Cell. Mol. Life Sci., 
66, 3755-3775. 

50. Shlyapnikov,S.V., Lunin,V.V., Perbandt,M., Polyakov,K.M., 
Lunin,V.Y., Levdikov,V.M., Betzel,C. and Mikhailov,A.M. (2000) 
Atomic structure of the Serratia marcescens endonuclease at 1.1 
A resolution and the enzyme reaction mechanism. 

Acta Crystallogr. D Biol. Crystallogr., 56, 567-572. 

51. Ghosh,M., Meiss,G., Pingoud,A., London.R.E. and Pedersen,L.C. 
(2005) Structural insights into the mechanism of nuclease A, a 
betabeta alpha metal nuclease from Anabaena. /. Mol. Biol., 280. 
27990-27997. 

52. Aravind.L., Makarova,K.S. and Koonin.E.V. (2000) Holliday 
junction resolvases and related nucleases: identification of new 
families, phyletic distribution and evolutionary trajectories. 
Nucleic Acids Res., 28, 3417-3432. 

53. Makhov,A.M., HannahJ.H., Brennan,M.J., Trus,B.L., Kocsis,E., 
ConwayJ.F., Wingfield,P.T., Simon,M.N. and Steven,A.C. (1994) 
Filamentous hemagglutinin of Bordetella pertussis. A bacterial 
adhesin formed as a 50-nm monomeric rigid rod based on a 
19-residue repeat motif rich in beta strands and turns. 

/. Mol. Biol., 241, 110-124. 

54. Beckmann.G, HankeJ., Bork,P. and ReichJ.G (1998) Merging 
extracellular domains: fold prediction for laminin G-like and 
amino-terminal thrombospondin-like modules based on homology 
to pentraxins. /. Mol. Biol., 275, 725-730. 

55. Jacob-Dubuisson,F., Locht,C. and Antoine,R. (2001) Two-partner 
secretion in Gram-negative bacteria: a thrifty, specific pathway 
for large virulence proteins. Mol. Microbiol., 40, 306-313. 

56. Krishna,S.S., MajumdarJ. and Grishin,N.V. (2003) Structural 
classification of zinc fingers: survey and summary. 

Nucleic Acids Res., 31, 532-550. 

57. Andreeva,A., Howorth,D., ChandoniaJ.M., Brenner,S.E., 
Hubbard.T.J., Chothia,C. and Murzin,A.G. (2008) The SCOP 
database. Nucleic Acids Res., 36, D419-D425. 

58. Aravind,L., Makarova,K.S. and Koonin.E.V. (2000) Survey and 
summary: holliday junction resolvases and related nucleases: 
identification of new families, phyletic distribution and 
evolutionary trajectories. Nucleic Acids Res., 28, 3417-3432. 

59. Sokolowska,M., Czapinska,H. and Bochtler,M. (2009) Crystal 
structure of the beta beta alpha-Me type II restriction 



endonuclease Hpy99I with target DNA. Nucleic Acids Res., 37, 
3799-3810. 

60. Renzi,F., Caffarelli,E., Laneve,P., BozzoniJ., Brunori.M. and 
Vallone.B. (2006) The structure of the endoribonuclease XendoU: 
From small nucleolar RNA processing to severe acute respiratory 
syndrome coronavirus replication. Proc. Natl Acad. Sci. USA, 
103, 12365-12370. 

61. Ricagno,S., Egloff.M.P., Ulferts,R., Coutard,B., Nurizzo.D., 
Campanacci.V., Cambillau.C, ZiebuhrJ. and Canard,B. (2006) 
Crystal structure and mechanistic determinants of SARS 
coronavirus nonstructural protein 15 define an endoribonuclease 
family. Proc. Natl. Acad. Sci. USA, 103, 11892-11897. 

62. WangJ., Chen,R. and Julin,D.A. (2000) A single nuclease active 
site of the Escherichia coli RecBCD enzyme catalyzes 
single-stranded DNA degradation in both directions. 

J. Mol. Biol., 275, 507-513. 

63. Carr,S., Walker,D., James,R., Kleanthous,C. and Hemmings.A.M. 
(2000) Inhibition of a ribosome-inactivating ribonuclease: the 
crystal structure of the cytotoxic domain of colicin E3 in complex 
with its immunity protein. Structure, 8, 949-960. 

64. Delattre.A.S., Clantin,B., Saint.N, Locht.C, Villeret.V. and 
Jacob-Dubuisson,F. (2010) Functional importance of a conserved 
sequence motif in FhaC, a prototypic member of the TpsB/ 
Omp85 superfamily. FEBS J., 277, 4755-4765. 

65. Pallen,M.J. (2002) The ESAT-6/WXG 1 00 superfamily— and a new 
Gram-positive secretion system? Trends Microbiol., 10, 209-212. 

66. Iyer.L.M., Makarova,K.S., Koonin,E.V. and Aravind,L. (2004) 
Comparative genomics of the FtsK-HerA superfamily of pumping 
ATPases: implications for the origins of chromosome segregation, 
cell division and viral capsid packaging. Nucleic Acids Res., 32, 
5260-5279. 

67. Peterson,F.C. and Volkman,B.F. (2009) Diversity of polyproline 
recognition by EVH1 domains. Front Biosci., 14, 833-846. 

68. Pallen.M.J., Chaudhuri,R.R. and HendersonJ.R. (2003) Genomic 
analysis of secretion systems. Curr. Opin. Microbiol, 6, 519-527. 

69. Das,D., Finn,R.D., Abdubek,P., Astakhova,T., Axelrod,H.L., 
Bakolitsa.C, Cai,X., Carlton,D., Chen,C, Chiu.H.J. et al. (2010) 
The crystal structure of a bacterial Sufu-like protein defines a 
novel group of bacterial proteins that are similar to the 
N-terminal domain of human Sufu. Protein Sci., 19, 2131-2140. 

70. Burglin.T.R. (2008) The Hedgehog protein family. Genome Biol., 
9, 241. 

71. Perler.F.B. (1998) Protein splicing of interns and hedgehog 
autoproteolysis: structure, function, and evolution. Cell, 92, 1^1. 

72. HalLT.M., PorterJ.A., Young.K.E., Koonin,E.V., Beachy,P.A. 
and Leahy, D.J. (1997) Crystal structure of a Hedgehog 
autoprocessing domain: homology between Hedgehog and 
self-splicing proteins. Cell, 91, 85-97. 

73. Cenciarelli,C, Chiaur,D.S., Guardavaccaro,D., Parks,W., 
Vidal,M. and Pagano.M. (1999) Identification of a family of 
human F-box proteins. Curr. Biol, 9, 1177-1179. 

74. Bai.C, Sen,P., Hofmann.K., Ma,L., GoebhM., Harper,J.W. and 
Elledge.S.J. (1996) SKP1 connects cell cycle regulators to the 
ubiquitin proteolysis machinery through a novel motif, the F-box. 
Cell, 86, 263-274. 

75. Tukachinsky,H., Lopez,L.V. and Salic,A. (2010) A mechanism for 
vertebrate Hedgehog signaling: recruitment to cilia and 
dissociation of SuFu-Gli protein complexes. J. Cell. Biol, 191, 
415-428. 

76. Wang,C, Pan.Y. and Wang,B. (2010) Suppressor of fused and 
Spop regulate the stability, processing and function of Gli2 and 
Gli3 full-length activators but not their repressors. Development, 
137, 2001-2009. 

77. Menard,C, Wagner,M., Ruzsics.Z., Holak.K., Brune,W., 
Campbell,A.E. and Koszinowski.U.H. (2003) Role of murine 
cytomegalovirus US22 gene family members in replication in 
macrophages. /. Virol, 11, 5557-5570. 

78. Valchanova,R.S., Picard-Maureau,M., Budt,M. and Brune,W. 
(2006) Murine cytomegalovirus ml 42 and ml 43 are both required 
to block protein kinase R-mediated shutdown of protein 
synthesis. /. Virol, 80, 10181-10190. 

79. Budt,M., Niederstadt,L., Valchanova,R.S., Jonjic,S. and Brune,W. 
(2009) Specific inhibition of the PKR-mediated antiviral response 



4552 Nucleic Acids Research, 2011, Vol. 39, No. 11 



by the murine cytomegalovirus proteins ml42 and ml43. 
J. Virol., 83, 1260-1270. 

80. Child,S.J. and Geballe,A-P- (2009) Binding and relocalization of 
protein kinase R by murine cytomegalovirus. /. Virol., 83, 
1790-1799. 

81. Child.S.J., Hanson.L.K., Brown,C.E., Janzen,D.M. and 
Geballe,A.P. (2006) Double-stranded RNA binding by a 
lieterodimeric complex of murine cytomegalovirus ml 42 and 
ml43 proteins. /. Virol., 80, 10173-10180. 

82. Hakki,M., Marshall,E.E., De Niro,K.L. and Geballe,A.P. (2006) 
Binding and nuclear relocalization of protein kinase R by human 
cytomegalovirus TRS1. J. Virol., 80, 11817-11826. 

83. Child,S.J., Hakki.M., De Niro,K.L. and Geballe,A.P. (2004) 
Evasion of cellular antiviral responses by human cytomegalovirus 
TRS1 and IRS1. J. Virol., 78, 197-205. 

84. Hakki,M. and Geballe,A.P. (2005) Double-stranded RNA binding 
by human cytomegalovirus pTRSl. /. virol., 79, 7311-7318. 

85. Marshall.E.E., Bierle,C.J., Brune,W. and Geballe,A.P. (2009) 
Essential role for either TRS1 or IRS1 in human cytomegalovirus 
replication. /. Virol, 83, 4112-4120. 

86. Cassady,K.A. (2005) Human cytomegalovirus TRS1 and IRS1 
gene products block the double-stranded-RNA-activated host 
protein shutoff response induced by herpes simplex virus type 1 
infection. J. Virol., 79, 8707-8715. 

87. Xuan,B., Qian,Z., Torigoi,E. and Yu,D. (2009) Human 
cytomegalovirus protein pUL38 induces ATF4 expression, 
inhibits persistent JNK phosphorylation, and suppresses 
endoplasmic reticulum stress-induced cell death. /. Virol., 83. 
3463-3474. 

88. Moorman,N.J., CristeaJ.M., Terhune,S.S., Rout,M.P., Chait.B.T. 
and Shenk,T. (2008) Human cytomegalovirus protein UL38 
inhibits host cell stress responses by antagonizing the tuberous 
sclerosis protein complex. Cell Host Microbe, 3, 253-262. 

89. Terhune,S., Torigoi,E., Moorman,N., Silva,M., Qian,Z., Shenk,T. 
and Yu,D. (2007) Human cytomegalovirus UL38 protein blocks 
apoptosis. /. Virol., 81, 3109-3123. 



90. McCormick,A.L., Roback.L., Livingston-Rosanoff,D. and 
St Clair,C. (2010) The human cytomegalovirus UL36 gene 
controls caspase-dependent and -independent cell death programs 
activated by infection of monocytes differentiating to 
macrophages. /. Virol, 84, 5108-5123. 

91. Mulec,J., Podlesek.Z., Mrak.P., KopitarA., Ihan,A. and Zgur- 
Bertok,D. (2003) A cka-gfp transcriptional fusion reveals that 
the colicin K activity gene is induced in only 3 percent of the 
population. /. Bacterial, 185, 654-659. 

92. Dugatkin,L.A. (2007) Inclusive fitness theory from Darwin to 
Hamilton. Genetics, 176, 1375-1380. 

93. Hayes,C.S., Aoki.S.K. and Low,D.A. (2010) Bacterial 
contact-dependent delivery systems. Annu. Rev. Genet., 44, 
71-90. 

94. Jackson,A.P., Thomas,G.H., ParkhillJ. and Thomson,N.R. 
(2009) Evolutionary diversification of an ancient gene family 
(rhs) through C-terminal displacement. BMC Genomics, 10, 584. 

95. Conticello,S.G. (2008) The AID/APOBEC family of nucleic acid 
mutators. Genome Biol, 9, 229. 

96. Tan,Y. and Riley.M.A. (1997) Nucleotide polymorphism in 
colicin E2 gene clusters: evidence for nonneutral evolution. 
Mol. Biol. EvoL, 14, 666-673. 

97. Iyer,L.M., Babu,M.M. and Aravind,L. (2006) The HIRAN 
domain and recruitment of chromatin remodeling and repair 
activities to damaged DNA. Cell Cycle, 5, 775-782. 

98. HamilLSJ., Cota,E., Chothia,C. and ClarkeJ. (2000) 
Conservation of folding and stability within a protein family: the 
tyrosine corner as an evolutionary cul-de-sac. J. Mol. Biol, 295, 
641-649. 

99. Velikovsky,C.A., Deng,L., Tasumi.S., Iyer.L.M., Kerzic,M.C, 
Aravind,L., Pancer,Z. and Mariuzza,R.A. (2009) 

Structure of a lamprey variable lymphocyte receptor in 
complex with a protein antigen. Nat. Struct. Mol. Biol, 16, 
725-730. 



